On Aug 17 2009, Jeff Squyres wrote:
On Aug 16, 2009, at 11:02 PM, Ralph Castain wrote:

I think the problem here, Eugene, is that performance benchmarks are far from the typical application. We have repeatedly seen this - optimizing for benchmarks frequently makes applications run less efficiently. So I concur with Chris on this one - let's not go -too- benchmark happy and hurt the regular users.

FWIW, I've seen processor binding help real user codes, too. Indeed, on a system where an MPI job has exclusive use of the node, how does binding hurt you?

Here is how, and I can assure you that's it's not nice, not at all; it can
kill an application dead.  I have some experience with running large SMP
systems (Origin, SunFire F15K and POWER3/4 racks) and this area was a
nightmare.

Process A is bound, and is waiting briefly for a receive.  All of the
other cores are busy with the processors bound to them.  There is then some
action from another process, a daemon or a kernel thread that needs service
from the kernel.  So it starts a thread on process A's core.  Unfortunately,
this is a long-running thread (e.g. NFS) so, when the other processors
finish, and A is the bottleneck, the whole job hangs until that kernel
thread finishes.

You can get a similar effect if process A is bound to a CPU which has an I/O device bound to it. When something else entirely starts hammering that device, even if it doesn't tie it up for long each time, bye-bye performance. This is typically a problem on multi-socket systems, of course, but could show up even on quite small ones.

For this reason, many schedulers ignore binding hints when they 'think' they
know better - and, no matter what the documentation says, hints is generally
all they are.  You can then get processes rotating round the processors,
exercising the inter-cache buses nicely ....  In my experience, binding can
sometimes make that more likely rather than less, and the best solutions are
usually different.

Yes, I used binding, but it was hell to set up, and many people give up,
saying that it degrades performance.  I advise ordinary users to avoid it
like the plague, and use more reliable tuning techniques.

UNLESS you have a threaded application, in which case -any- binding can be highly detrimental to performance.

I'm not quite sure I understand this statement. Binding is not inherently contrary to multi-threaded applications.

That is true.  But see above.

Another circumstance where that is true is when your application is a MPI
one, but which calls SMP-enabled libraries; this is getting increasingly
common.  Binding can stop those using spare cores or otherwise confuse
them; God help you if they start to use a 4-core algorithm on one core!


Regards,
Nick Maclaren.



Reply via email to