On Aug 16, 2009, at 11:02 PM, Ralph Castain wrote:
I think the problem here, Eugene, is that performance benchmarks are far from the typical application. We have repeatedly seen this - optimizing for benchmarks frequently makes applications run less efficiently. So I concur with Chris on this one - let's not go -too- benchmark happy and hurt the regular users.
FWIW, I've seen processor binding help real user codes, too. Indeed, on a system where an MPI job has exclusive use of the node, how does binding hurt you?
On nodes where multiple MPI jobs are running, if a resource manager is being used, then we should be obeying what they have specified for each job to use. We need a bit more work in that direction to make that work, but it's very do-able.
When resource managers are not used and multiple MPI jobs share the same node, then OMPI has to coordinate amongst its jobs to not oversubscribe cores (when possible). As Ralph indicated in a later mail, we still need some work in this area, too.
Here at LANL, binding to-socket instead of to-core hurts performance by ~5-10%, depending on the specific application. Of course, either binding method is superior to no binding at all...
This is probably not too surprising (i.e., allowing the OS to move jobs around between cores on a socket can probably involve a little cache thrashing, resulting in that 5-10% loss). I'm hand-waving here, and I have not tried this myself, but it's not too surprising of a result to me. It's also not too surprising that others don't see this effect at all (e.g., Sun didn't see any difference in bind-to-core vs. bind-to-socket) when they ran their tests. YMMV.
I'd actually be in favor of a by-core binding (not by-socket), but spreading the processes out round robin by socket, not by core. All of this would be the *default* behavior, of course -- command line params/MCA params will be provided to change to whatever pattern is desired.
UNLESS you have a threaded application, in which case -any- binding can be highly detrimental to performance.
I'm not quite sure I understand this statement. Binding is not inherently contrary to multi-threaded applications.
-- Jeff Squyres jsquy...@cisco.com