Re: [OMPI devel] Heads up on new feature to 1.3.4

N.M. Maclaren Mon, 17 Aug 2009 10:38:53 -0400

On Aug 17 2009, Jeff Squyres wrote:

On Aug 16, 2009, at 11:02 PM, Ralph Castain wrote:
I think the problem here, Eugene, is that performance benchmarks arefar from the typical application. We have repeatedly seen this -optimizing for benchmarks frequently makes applications run lessefficiently. So I concur with Chris on this one - let's not go -too-benchmark happy and hurt the regular users.
FWIW, I've seen processor binding help real user codes, too. Indeed,on a system where an MPI job has exclusive use of the node, how doesbinding hurt you?


Here is how, and I can assure you that's it's not nice, not at all; it can
kill an application dead.  I have some experience with running large SMP
systems (Origin, SunFire F15K and POWER3/4 racks) and this area was a
nightmare.

Process A is bound, and is waiting briefly for a receive.  All of the
other cores are busy with the processors bound to them.  There is then some
action from another process, a daemon or a kernel thread that needs service
from the kernel.  So it starts a thread on process A's core.  Unfortunately,
this is a long-running thread (e.g. NFS) so, when the other processors
finish, and A is the bottleneck, the whole job hangs until that kernel
thread finishes.

You can get a similar effect if process A is bound to a CPU which has anI/O device bound to it. When something else entirely starts hammering thatdevice, even if it doesn't tie it up for long each time, bye-byeperformance. This is typically a problem on multi-socket systems, ofcourse, but could show up even on quite small ones.


For this reason, many schedulers ignore binding hints when they 'think' they
know better - and, no matter what the documentation says, hints is generally
all they are.  You can then get processes rotating round the processors,
exercising the inter-cache buses nicely ....  In my experience, binding can
sometimes make that more likely rather than less, and the best solutions are
usually different.

Yes, I used binding, but it was hell to set up, and many people give up,
saying that it degrades performance.  I advise ordinary users to avoid it
like the plague, and use more reliable tuning techniques.

UNLESS you have a threaded application, in which case -any- bindingcan be highly detrimental to performance.
I'm not quite sure I understand this statement. Binding is notinherently contrary to multi-threaded applications.


That is true.  But see above.

Another circumstance where that is true is when your application is a MPI
one, but which calls SMP-enabled libraries; this is getting increasingly
common.  Binding can stop those using spare cores or otherwise confuse
them; God help you if they start to use a 4-core algorithm on one core!


Regards,
Nick Maclaren.

Re: [OMPI devel] Heads up on new feature to 1.3.4

Reply via email to