Hi Chris

The devel trunk has all of this in it - you can get that tarball from the OMPI web site (take the nightly snapshot).

I plan to work on cpuset support beginning Tues morning.

Ralph

On Aug 17, 2009, at 7:18 PM, Chris Samuel wrote:


----- "Eugene Loh" <eugene....@sun.com> wrote:

Hi Eugene,

[...]
It would be even better to have binding selections adapt to other
bindings on the system.

Indeed!

This touches on the earlier thread about making OMPI aware
of its cpuset/cgroup allocation on the node (for those sites
that are using it), it might solve this issue quite nicely as
OMPI would know precisely what cores & sockets were allocated
for its use without having to worry about other HPC processes.

No idea how to figure that out for processes outside of cpusets. :-(

In any case, regardless of what the best behavior is, I appreciate
the point about changing behavior in the middle of a stable release.

Not a problem, and I take Jeff's point about 1.3 not being a
super stable release and thus not being a blocker to changes
such as this.

Arguably, leaving significant performance on the table in typical
situations is a bug that warrants fixing even in the middle of a
release, but I won't try to settle that debate here.

I agree for those cases where there's no downside, and thinking
further on your point of balancing between sockets I can see why
that would limit the impact.

Most of the cases I can think of that would be most adversely
affected are down to other jobs binding to cores naively and if
that's happening outside of cpusets then the cluster sysadmin
has more to worry about from mixing those applications than
mixing with OMPI ones which are just binding to sockets. :-)

So I'll happily withdraw my objection on those grounds.

*But* I would like to test this code out on a cluster with
cpuset support enabled to see whether it will behave itself.

Basically if I run a 4 core MPI job on a dual socket system
which has been allocated only the cores on socket 0 what will
happen when it tries to bind to socket 1 which is outside its
cpuset ?

Is there a 1.3 branch or tarball with these patches applied
that I could test out ?

cheers,
Chris
--
Christopher Samuel - (03) 9925 4751 - Systems Manager
The Victorian Partnership for Advanced Computing
P.O. Box 201, Carlton South, VIC 3053, Australia
VPAC is a not-for-profit Registered Research Agency
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

Reply via email to