Re: [OMPI devel] sm BTL flow management

Brian W. Barrett Thu, 25 Jun 2009 16:17:19 -0400

All -

Jeff, Eugene, and I had a long discussion this morning on the sm BTL flowmanagement issues and came to a couple of conclusions.

* Jeff, Eugene, and I are all convinced that Eugene's addition of pollingthe receive queue to drain acks when sends start backing up is requiredfor deadlock avoidance.

* We're also convinced that George's proposal, while a good idea ingeneral, is not sufficient. The send path doesn't appear to sufficientlyprogress the btl to avoid the deadlocks we're seeing with the SM btltoday. Therefore, while I still recommend sizing the fifo appropriatelyand limiting the freelist size, I think it's not sufficient to solve allproblems.

* Finally, it took an hour, but we did determine one of the majordifferences between 1.2.8 and 1.3.0 in terms of sm is how messages werepulled off the FIFO. In 1.2.8 (and all earlier versions), we return frombtl_progress after a single message is received (ack or message) or thefifo was empty. In 1.3.0 (pre-srq work Eugene did), we changed tocompletely draining all queues before returning from btl_progress. Thishas led to a situation where a single call to btl_progress can make alarge number of callbacks into the PML (900,000 times in one of Eugene'stest case). The change was made to resolve an issue Terry was having withperformance of a benchmark. We've decided that it would be adventageousto try something between the two points and drain X number of messagesfrom the queue, then return, where X is 100 or so at most. This shouldcover the performance issues Terry saw, but still not cause the hugenumber of messages added to the unexpected queue with a single call toMPI_Recv. Since a recv that is matched on the unexpected queue doesn'tresult in a call to opal_progress, this should help balance the load alittle bit better. Eugene's going to take a stab at implementing thisshort term.

I think the combination of Euegene's deadlock avoidance fix and thecareful queue draining should make me comfortable enough to start anotherround of testing, but at least explains the bottom line issues.


Brian

Re: [OMPI devel] sm BTL flow management

Reply via email to