On Aug 17 2009, Jeff Squyres wrote:
On Aug 16, 2009, at 11:02 PM, Ralph Castain wrote:
I think the problem here, Eugene, is that performance benchmarks are
far from the typical application. We have repeatedly seen this -
optimizing for benchmarks frequently makes applications run less
efficiently. So I concur with Chris on this one - let's not go -too-
benchmark happy and hurt the regular users.
FWIW, I've seen processor binding help real user codes, too. Indeed,
on a system where an MPI job has exclusive use of the node, how does
binding hurt you?
Here is how, and I can assure you that's it's not nice, not at all; it can
kill an application dead. I have some experience with running large SMP
systems (Origin, SunFire F15K and POWER3/4 racks) and this area was a
nightmare.
Process A is bound, and is waiting briefly for a receive. All of the
other cores are busy with the processors bound to them. There is then some
action from another process, a daemon or a kernel thread that needs service
from the kernel. So it starts a thread on process A's core. Unfortunately,
this is a long-running thread (e.g. NFS) so, when the other processors
finish, and A is the bottleneck, the whole job hangs until that kernel
thread finishes.
You can get a similar effect if process A is bound to a CPU which has an
I/O device bound to it. When something else entirely starts hammering that
device, even if it doesn't tie it up for long each time, bye-bye
performance. This is typically a problem on multi-socket systems, of
course, but could show up even on quite small ones.
For this reason, many schedulers ignore binding hints when they 'think' they
know better - and, no matter what the documentation says, hints is generally
all they are. You can then get processes rotating round the processors,
exercising the inter-cache buses nicely .... In my experience, binding can
sometimes make that more likely rather than less, and the best solutions are
usually different.
Yes, I used binding, but it was hell to set up, and many people give up,
saying that it degrades performance. I advise ordinary users to avoid it
like the plague, and use more reliable tuning techniques.
UNLESS you have a threaded application, in which case -any- binding
can be highly detrimental to performance.
I'm not quite sure I understand this statement. Binding is not
inherently contrary to multi-threaded applications.
That is true. But see above.
Another circumstance where that is true is when your application is a MPI
one, but which calls SMP-enabled libraries; this is getting increasingly
common. Binding can stop those using spare cores or otherwise confuse
them; God help you if they start to use a 4-core algorithm on one core!
Regards,
Nick Maclaren.