Re: [OMPI devel] Multirail + Open MPI 1.6.1 = very big latency for the first communication

TERRY DONTJE Thu, 1 Nov 2012 06:35:10 -0400

IIRC, the first 16 or so messages over the openib btl uses the send/recvAPI as opposed to rdma which is significantly faster. I am not sure asto how 1.5.3 and multi-rail affects this but the preconnected I believeshort circuits when one cuts over to use rdma for eager messages.


--td


On 10/31/2012 3:36 PM, Paul Kapinos wrote:

Hello all,

Open MPI is clever and use by default multiple IB adapters, if available.
http://www.open-mpi.org/faq/?category=openfabrics#ofa-port-wireup

Open MPI is lazy and establish connections only iff needed.

Both is good.
We have kinda special nodes: up to 16 sockets, 128 cores, 4 boards, 4IB cards. Multirail works!
The crucial thing is, that starting with v1.6.1 the latency of thevery first PingPong sample between two nodes take really a lot of time- some 100x - 200x of usual latency. You cannot see this using usuallatency benchmark(*) because they tend to omit the first samples as"warmup phase", but we use a kinda self-written parallel test whichclearly show this (and let me to muse some days).If Miltirail is forbidden (-mca btl_openib_max_btls 1), or if v.1.5.3used, or if the MPI processes are preconnected(http://www.open-mpi.org/faq/?category=running#mpi-preconnect) thereis no such huge latency outliers for the first sample.
Well, we know about the warm-up and lazy connections.

But 200x ?!

Any comments about that is OK so?

Best,

Paul Kapinos
(*) E.g. HPCC explicitely say inhttp://icl.cs.utk.edu/hpcc/faq/index.html#132> Additional startup latencies are masked out by starting themeasurement after
> one non-measured ping-pong.
P.S. Sorry for cross-posting to both Users and Developers, but my lastquestions to Users have no reply until yet, so trying to broadcast...
_______________________________________________
devel mailing list
[email protected]
http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] Multirail + Open MPI 1.6.1 = very big latency for the first communication

Reply via email to