Why doesn’t java spaces let you submit requests and have them worked on and results returned without assigning any particular node any specific responsibility?
Gregg > On Aug 1, 2015, at 12:06 PM, Bryan Thompson <br...@systap.com> wrote: > > First, thanks for the responses and the interest in this topic. I have > been traveling for the last few days and have not had a chance to follow up. > > - The network would have multiple switches, but not multiple routers. The > typical target is an infiniband network. Of course, Java let's us bind to > infiniband now. > > - As far as I understand it, MPI relies on each node executing the same > logic in a distributed communication pattern. Thus the concept of a leader > election to determine a balanced tree probably does not show up. Instead, > the tree is expressed in terms of the MPI rank assigned to each node. I am > not suggesting that the same design pattern is used for river. > > - We do need a means to define the relationship between a distributed > communication pattern and the manner in which data are decomposed onto a > cluster. I am not sure that the proposal above gives us this directly, but > some extension of it probably would. Let me give an example. In our > application, we are distributing the edges of a graph among a 2-D cluster > of compute nodes (p x p machines). The distribution is done by assigning > the edges to compute nodes based on some function (key-range, > hash-function) of the source and target vertex identifiers. When we want > to read all edges in the graph, we need to do an operation that is data > parallel across either the rows (in-edges) or the columns (out-edges) of > the cluster. See http://mapgraph.io/papers/UUSCI-2014-002.pdf for a TR that > describes this communication pattern for a p x p cluster of GPUs. In order > to make this work with river, we would somehow have to associate the nodes > with their positions in this 2-D topology. For example, we could annotate > each node with a "row" and "column" attribute that specifies its location > in the compute grid. We could then have a communicator for each row and > each column based on the approach you suggest above. > > The advantage of such tree based communication patterns is quite large. > They require log(p) communication operations where you would otherwise do p > communication operations. So, for example, only 4 communication operations > vs 16 for a 16 node cluster. > > Thanks, > Bryan > > > On Wed, Jul 29, 2015 at 1:17 PM, Greg Trasuk <tras...@stratuscom.com> wrote: > >> >> I’ve wondered about doing this in the past, but for the workloads I’ve >> worked with, I/O time has been relatively low compared to processing time. >> I’d guess there’s some combination of message frequency, cluster size and >> message size that makes it compelling. >> >> The idea is interesting, though, because it could enable things like >> distributed JavaSpaces, where we’d be distributing the search queries, etc. >> >> I would guess the mechanism would look like: >> >> -Member nodes want to form a multicast group. >> -They elect a leader >> -Leader figures out a balanced notification tree, and passes it on to each >> member >> -Leader receives multicast message and starts the message passing into the >> tree >> -Recipients pass the message to local recipients, and also to their >> designated repeater recipients (how many?) >> -Somehow we monitor for disappearing members and then recast the leader >> election if necessary. >> >> Paxon protocol would be involved, I’d guess. Does anyone have references >> to any academic work on presence monitoring and leader election, beyond >> Lamport’s original paper? >> >> I also wonder, is there a reason not to just use Multicast if it’s >> available (I realize that it isn’t always supported - Amazon EC2, for >> instance). >> >> Interesting question! >> >> Cheers, >> >> Greg Trasuk >> >>> On Jul 29, 2015, at 12:40 PM, Bryan Thompson <br...@systap.com> wrote: >>> >>> Hello, >>> >>> I am wondering if anyone has looked into creating tree based algorithms >> for >>> multi-cast of RMI messages for river. Assuming a local cluster, such >>> patterns generally have log(p) cost for a cluster with p nodes. >>> >>> For the curious, this is how many MPI messages are communicated under the >>> hood. >>> >>> Thanks, >>> Bryan >> >>