On Fri, 13 Jan 2006, James Strachan wrote:
> > The infiniband transport would be native code, so you could use JNI. > > However, it would definitely be worth it. > > Agreed! I'd *love* a Java API to Infiniband! Have wanted one for ages > & google every once in a while to see if one shows up :) > > It looks like MPI has support for Infiniband; would it be worth > trying to wrap that in JNI? > http://www-unix.mcs.anl.gov/mpi/ > http://www-unix.mcs.anl.gov/mpi/mpich2/ I did find that HP has a Java interface for MPI. However, to me it doesn't necessarily seem that this is the way to go. I think for writing distributed computations it would be the right choice, but I think that the people who write those choose to work in a natively compiled language, and I think that this may be the reason why this Java mpi doesn't appear to be that well-known. However I did find something which might work for us, namely UDAPL from the DAT Collaborative. A consortium created a spec for interface to anything that provides RDMA capabilities: http://www.datcollaborative.org/udapl.html The header files and the spec are right there. I downloaded the only release made by infiniband.sf.net and they claim that it only works with kernel 2.4, and that for 2.6 you have to use openib.org. The latter claims to provide an implementation of UDAPL: http://openib.org/doc.html The wiki has a lot of info. >From the mailing list archive you can tell that this project has a lot of momentum: http://openib.org/pipermail/openib-general/ I think the next thing to do would be to prove that using RDMA as opposed to udp is worthwhile. I think it is, because JITs are so fast now, but I think that before planning anything long-term I would get two infiniband-enabled boxes and write a little prototype. I think Appro sells infiniband blades with Mellanox hcas. There is also IBM's proprietary API for clustering mainframes, the Coupling Facility: http://www.research.ibm.com/journal/sj36-2.html There are some amazing articles there. Personally I also think there is value in implementing replication using udp (process groups libraries such as evs4j), so I would pursue both at the same time. Guglielmo
