Re: [computer-go] MPI vs Thread-safe

David Doshay Thu, 29 Oct 2009 16:39:45 -0700

It depends upon the scaling you want. Some of what you write seems to
imply that you are thinking about MCTS programs, while your questions
are also more general.


When we wrote SlugGo (one of the top programs a few years ago but
in hibernation now) we went with MPI. MPI lets you simulate as many
compute nodes as you want on a single CPU, so it is great for testing
before hitting the cluster. We wrote a scheduler that would use pure
MPI calls to distribute tasks over both shared cores and remote boxes;
MPI makes it really easy to specify how many processes run on each
IP address, so you can run on a cluster of dissimilar boxes if that is
what you have available. Our trees were all different at the top node,
so we never worried about wasted memory.

The time spent making the MPI calls was unimportant compared to the
time it took to do any of the distributed tasks.

My primary observation about MPI is that it is a very big and richsystem,but you can get started with only a very small subset of itsfunctionality.

For example, MPI can have blocking or non-blocking calls to a new node
for a calculation, but SlugGo only used blocking.

The biggest drawback of large distributed MPI programs is debugging.
MPI tells you when a node crashes, but finding the reason was not
always simple. If anybody out there knows of an debugger that is MPI
aware then I would love to know about it.

Cheers,
David



On 29, Oct 2009, at 11:40 AM, Brian Sheppard wrote:

I have a question for those who have parallelized programs.
It seems like MPI is the obvious architecture when scaling a programtomultiple machines. Let's assume that we implement a program that hasthat
capability.
Now, it is possible to use MPI for scaling *within* a compute node.Forexample, on a 4-core machine we could run four processes and use MPIto
synchronize them.
That policy has the obvious downside that the shared memory on amulti-corebox is fragmented, and some portion of the tree is duplicated evenwithin
processes, which seems wasteful.

For this reason I have assumed that programs would use a thread-safe
shared-memory design within a multi-core box, and only use MPI toscale to
clusters.
But there are downsides to that design as well. Like the extracomplexity of
having two models for parallel programming.
And I don't really know the cost of duplicating nodes. Maybe thetree splitsso much that different processes share relatively few nodes. Ormaybe you
can allocate trials so that is the case.
And now my question: what do you actually do: MPI, thread-safe,both, or
something else?

And can you share any observations about your choices?

Thanks,
Brian


_______________________________________________
computer-go mailing list
[email protected]
http://www.computer-go.org/mailman/listinfo/computer-go/


_______________________________________________
computer-go mailing list
[email protected]
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] MPI vs Thread-safe

Reply via email to