1) By heterogeneous do you mean Derived Datatypes? MPJ Express's buffering layer handles this. It flattens the data into a ByteBuffer. In this way native device doesn't have to worry about Derived Datatypes (those things are handled at top layers). And an interesting thing, intuitively Java users would use the MPI.OBJECT if there is heterogeneous data to be sent (but yes, MPI.OBJECT is a slow case ...)
Currently same goes for user defined Op-functions. Those are handled at the top layers, i.e using MPJ Express's algorithms not native MPI's (but communication is native). 2) API changes: Do you envision to document the changes to something like a mpiJava 1.3 specs or something? 3) New Benchmark Results: I did the benchmarking again with various configurations: i) Open MPI 1.7.4 C ii) MVAPICH2.2 C iii) MPJ Express (using Open MPI - with arrays) iv) Open MPI's Java Bindings (with a large user array -- the unoptimized case) v) Open MPI's Java Bindings (with arrays, where size of the user array is equal to the data point, to be fair) vi) MPJ Express (using MVAPICH2.2 - with arrays) vii) Open MPI's Java Bindings (using MPI.new<Type>Buffer, ByteBuffer) viii) MPJ Express (using Open MPI - with ByteBuffer, this is from the device layer of MPJ Express, this helps see how MPJ Express could perform if in future we add MPI.new<Type>Buffer like functionality) ix) MPJ Express (using MVAPICH2.2 - with ByteBuffer) --> currently I don't know how it performs better than Open MPI? Bibrak Qamar On Mon, Mar 24, 2014 at 10:16 PM, Jeff Squyres (jsquyres) < jsquy...@cisco.com> wrote: > On Mar 14, 2014, at 9:29 AM, Bibrak Qamar <bibr...@gmail.com> wrote: > > > It works for Open MPI but for MPICH3 I have to comment the dlopen. Is > there any way to tell the compiler if its using Open MPI (mpicc) then use > dlopen else keep it commented? Or some thing else? > > If Open MPI's mpi.h, we have defined OPEN_MPI. You can therefore use #if > defined (OPEN_MPI). > > > Yes, there are some places where we need to be sync with the internals > of the native MPI implementation. These are in section 8.1.2 of MPI 2.1 ( > http://www.mpi-forum.org/docs/mpi-2.1/mpi21-report.pdf). For example the > MPI_TAG_UB. For the pure Java devices of MPJ Express we have always used > Integer.MAX_VALUE. > > > > Datatypes? > > > > MPJ Express uses an internal buffering layer to buffer the user data > into a ByteBuffer. In this way for the native device we end up using the > MPI_BYTE datatype most of the time. ByteBuffer simplifies matters since it > is directly accessible from the native code. > > Does that mean you can't do heterogeneous? (not really a huge deal, since > most people don't run heterogeneously, but something to consider) > > > With our current implementation there is one exception to it i.e. in the > Reduce, Allreduce and Reduce_scatter where the native MPI implementation > needs to know which Java datatype its going to process. Same goes for MPI.Op > > And Accumulate and the other Op-using functions, right? > > > On Are your bindings similar in style/signature to ours? > > No, they use the real datatypes. > > > I checked it and there are differences. MPJ Express (and FastMPJ also) > implements the mpiJava 1.2 specifications. There is also MPJ API (this is > very close to mpiJava 1.2 API). > > > > Example 1: Getting the rank and size of COMM_WORLD > > > > MPJ Express (the mpiJava 1.2 API): > > public int Size() throws MPIException; > > public int Rank() throws MPIException; > > > > MPJ API: > > public int size() throws MPJException; > > public int rank() throws MPJException; > > > > Open MPI's Java bindings: > > public final int getRank() throws MPIException; > > public final int getSize() throws MPIException; > > Right -- we *started* with the old ideas, but then made the conscious > choice to update the Java bindings in a few ways: > > - make them more like modern Java conventions (e.g., camel case, use > verbs, etc.) > - get rid of MPI.OBJECT > - use modern, efficient Java practices > - basically, we didn't want to be bound by any Java decisions that were > made long ago that aren't necessarily relevant any more > - and to be clear: we couldn't find many existing Java MPI codes, so > compatibility with existing Java MPI codes was not a big concern > > One thing we didn't do was use bounce buffers for small messages, which > shows up in your benchmarks. We're considering adding that optimization, > and others. > > > Example 2: Point-to-Point communication > > > > MPJ Express (the mpiJava 1.2 API): > > public void Send(Object buf, int offset, int count, Datatype datatype, > int dest, int tag) throws MPIException > > > > public Status Recv(Object buf, int offset, int count, Datatype datatype, > > int source, int tag) throws MPIException > > > > MPJ API: > > public void send(Object buf, int offset, int count, Datatype datatype, > int dest, int tag) throws MPJException; > > > > public Status recv(Object buf, int offset, int count, Datatype > datatype, int source, int tag) throws MPJException > > > > Open MPI's Java bindings: > > public final void send(Object buf, int count, Datatype type, int dest, > int tag) throws MPIException > > > > public final Status recv(Object buf, int count, Datatype type, int > source, int tag) throws MPIException > > > > Example 3: Collective communication > > > > MPJ Express (the mpiJava 1.2 API): > > public void Bcast(Object buf, int offset, int count, Datatype type, int > root) > > throws MPIException; > > > > MPJ API: > > public void bcast(Object buffer, int offset, int count, Datatype > datatype, int root) throws MPJException; > > > > Open MPI's Java bindings: > > public final void bcast(Object buf, int count, Datatype type, int root) > throws MPIException; > > > > > > I couldn't find which API the Open MPI's Java bindings implement? > > Our own. :-) > > > But while reading your README.JAVA.txt and your code I realised that you > are trying to avoid buffering overhead by giving the user the flexibility > to directly allocate data onto a ByteBuffer using MPI.new<Type>Buffer, > hence not following the mpiJava 1.2 specs (for communication operations)? > > Correct. > > > On Performance Comparison > > > > Yes this is interesting, I have managed to do two kind of tests: > Ping-Pong (Latency and Bandwidth) and Collective Communications (Bcast). > > > > Attached are graphs and the programs (testcases) that I used. The tests > were done using Infiniband, more on the platform here > http://www.nust.edu.pk/INSTITUTIONS/Centers/RCMS/AboutUs/facilities/screc/Pages/Resources.aspx > > > > One reason for Open MPI's java bindings low performance (in the > Bandwidth.png graph) is the way the test case was written > (Bandwidth_OpenMPi.java). It allocates a total of 16M of byte array and > uses the same array in send/recv for each data point (by varying count). > > > > This could be mainly because of the following code in mpi_Comm.c (let me > know if I am mistaken) > > > > static void* getArrayPtr(void** bufBase, JNIEnv *env, > > jobject buf, int baseType, int offset) > > { > > switch(baseType) > > { > > ... > > ... > > case 1: { > > jbyte* els = (*env)->GetByteArrayElements(env, buf, NULL); > > *bufBase = els; > > return els + offset; > > } > > ... > > ... > > } > > > > Get<PrimitiveType>ArrayElements routine every time gets the entire array > even if the user wants to send some elements (the count). This might be one > reason for Open MPI' Java bindings to advocate for the MPI.new<Type>Buffer. > The other reason is naturally the buffering overhead. > > Yes. > > There's *always* going to be a penalty to pay if you don't use native > buffers, just due to the nature of Java garbage collection, etc. > > > From the above experience, for the bandwidth of Bcast operation, I > modified the test case to only allocate as much array as need for that > Bcast and took the results. For a fairer comparison between MPJ Express and > Open MPI's Java bindings I didn't use the MPI.new<Type>Buffer. > > It would be interesting to see how using the native buffers compares, too > -- i.e., are we correct in advocating for the use of native buffers? > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/03/14384.php >