Hi Dan,
> Anyone here interested in getting Java to handle heavy I/O nicely?
Exactly the topic of my research :-) You should check out Jaguar,
a system I have developed to do high-performance networking and I/O in
Java. It's at
http://www.cs.berkeley.edu/~mdw/proj/jaguar
Using Jaguar I can obtain over 480 megabits/sec bandwidth over our
cluster area network (the same performance as in C). I have also
implemented a near zero-cost serialization technique using Jaguar;
the idea is to lay out object fields in a form that is "pre-serialized"
(hence the name, "Pre-Serialized Objects").
We are also working on building large-scale Internet services in Java.
Many of the problems you talk about on your 'c10k' web page (which is
very good, by the way) are things we are addressing. Along with the
"Ninja" project here at Berkeley we're building large-scale server
applications in Java using workstation clusters. One of my colleagues,
Steve Gribble, has done a lot of performance characterization and
tuning of Java applications which require high I/O throughput. What
we're working on now is pulling together Jaguar, his work, and some
other ideas we've had into a new system which obtains very high
I/O performance -- all in Java, of course.
You also might want to read a paper we wrote recently about engineering
systems for high throughput. What we have in mind here is Java,
but it applies to any language and O/S:
http://www.cs.berkeley.edu/~mdw/papers/events.pdf
I'm also on the Expert Group for JSR 51 (the new I/O APIs for the Java
platform), so hopefully good things will happen there!
My personal feeling is that there is a lot that will need to happen at
both the Java and the O/S level to get great I/O performance. I am not
sure I agree with many of the discussions in the linux-kernel list
that the right way to get high I/O bandwidth is just to use some
bastardization of signals; I think that the folks at Rice are a lot
closer to the mark with their novel event-delivery mechanism (by
this I mean http://www.cs.rice.edu/~druschel/usenix99event.ps.gz).
Once you can get Linux by itself to handle the load, delivering the
same performance to Java applications is still very tricky. Using
native methods won't work -- I have demonstrated that native methods
can be extremely slow, especially when you pass a large amount of
data across the JNI interface. JNI is also not expressive enough for
many things you want to do. As a trivial example, think about how
you would "wrap" a data structure in C -- outside of the Java heap --
in a Java object or array. With JNI, you simply can't do it. Jaguar
solves this problem, but there are other potential solutions too.
Apart from the native code interface, the traditional Java performance
issues of compilation, garbage collection, and threading still need to
be solved (especially for Linux, which generally does not have the
state-of-the-art JVMs and compilers available for it). I also believe
that once you have a large amount of I/O going on in your application,
that traditional approaches to garbage collection will not be able to cope
with it. But that is another discussion!
So ... it's a very interesting space to work in right now. Demanding
I/O applications place a lot of new demands on Java (and operating systems
as well). I would be interested in having more discussion with people on
this list about their experiences!
Cheers-
Matt Welsh, [EMAIL PROTECTED]
----------------------------------------------------------------------
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]