It depends upon the scaling you want. Some of what you write seems to
imply that you are thinking about MCTS programs, while your questions
are also more general.
When we wrote SlugGo (one of the top programs a few years ago but
in hibernation now) we went with MPI. MPI lets you simulate as many
compute nodes as you want on a single CPU, so it is great for testing
before hitting the cluster. We wrote a scheduler that would use pure
MPI calls to distribute tasks over both shared cores and remote boxes;
MPI makes it really easy to specify how many processes run on each
IP address, so you can run on a cluster of dissimilar boxes if that is
what you have available. Our trees were all different at the top node,
so we never worried about wasted memory.
The time spent making the MPI calls was unimportant compared to the
time it took to do any of the distributed tasks.
My primary observation about MPI is that it is a very big and rich
system,
but you can get started with only a very small subset of its
functionality.
For example, MPI can have blocking or non-blocking calls to a new node
for a calculation, but SlugGo only used blocking.
The biggest drawback of large distributed MPI programs is debugging.
MPI tells you when a node crashes, but finding the reason was not
always simple. If anybody out there knows of an debugger that is MPI
aware then I would love to know about it.
Cheers,
David
On 29, Oct 2009, at 11:40 AM, Brian Sheppard wrote:
I have a question for those who have parallelized programs.
It seems like MPI is the obvious architecture when scaling a program
to
multiple machines. Let's assume that we implement a program that has
that
capability.
Now, it is possible to use MPI for scaling *within* a compute node.
For
example, on a 4-core machine we could run four processes and use MPI
to
synchronize them.
That policy has the obvious downside that the shared memory on a
multi-core
box is fragmented, and some portion of the tree is duplicated even
within
processes, which seems wasteful.
For this reason I have assumed that programs would use a thread-safe
shared-memory design within a multi-core box, and only use MPI to
scale to
clusters.
But there are downsides to that design as well. Like the extra
complexity of
having two models for parallel programming.
And I don't really know the cost of duplicating nodes. Maybe the
tree splits
so much that different processes share relatively few nodes. Or
maybe you
can allocate trials so that is the case.
And now my question: what do you actually do: MPI, thread-safe,
both, or
something else?
And can you share any observations about your choices?
Thanks,
Brian
_______________________________________________
computer-go mailing list
[email protected]
http://www.computer-go.org/mailman/listinfo/computer-go/
_______________________________________________
computer-go mailing list
[email protected]
http://www.computer-go.org/mailman/listinfo/computer-go/