On Sat, Aug 11, 2007 at 09:55:18AM -0700, Jeff Squyres wrote:
> With Mellanox's new HCA (ConnectX), extremely low latencies are  
> possible for short messages between two MPI processes.  Currently,  
> OMPI's latency is around 1.9us while all other MPI's (HP MPI, Intel  
> MPI, MVAPICH[2], etc.) are around 1.4us.  A big reason for this  
> difference is that, at least with MVAPICH[2], they are doing wire  
> protocol header caching where the openib BTL does not.  Specifically:
> 
> - Mellanox tested MVAPICH with the header caching; latency was around  
> 1.4us
> - Mellanox tested MVAPICH without the header caching; latency was  
> around 1.9us
> 
As far as I remember Mellanox results and according to our testing
difference between MVAPICH with header caching and OMPI is 0.2-0.3us.
Not 0.5us. And MVAPICH without header caching is actually worse then
OMPI for small messages.

> Given that OMPI is the lone outlier around 1.9us, I think we have no  
> choice except to implement the header caching and/or examine our  
> header to see if we can shrink it.  Mellanox has volunteered to  
> implement header caching in the openib btl.
I think we have a chose. Not implement header caching, but just change the
osu_latency benchmark to send each message with different tag :)
I am not against header caching per se, but if it will complicate code
even a little bit I don't think we should implemented it just to benefit one
fabricated benchmark (AFAIR before header caching was implemented in
MVAPICH mpi_latency actually sent messages with different tags).
Also there is really nothing to cache in openib BTL. Openin BTL header is 4
bytes long. The caching will have to be done in OB1 and there it will
affect every other interconnect.

> 
> Any objections?  We can discuss what approaches we want to take  
> (there's going to be some complications because of the PML driver,  
> etc.); perhaps in the Tuesday Mellanox teleconf...?
> 
My main objection is that the only reason you propose to do this is some
bogus benchmark? Is there any other reason to implement header caching?
I also hope you don't propose to break layering and somehow cache PML headers
in BTL.

--
                        Gleb.

Reply via email to