Just my personal viewpoint. For small size of global information, considering to store the state in ZooKeeper might be a reasonable solution.
On 13 July 2013 21:28, andronat_asf <[email protected]> wrote: > Hello everyone, > > I'm working on HAMA-767 and I have some concerns on counters and scalability. > Currently, every peer has a set of vertices and a variable that is keeping > the total number of vertices through all peers. In my case, I'm trying to add > and remove vertices during the runtime of a job, which means that I have to > update all those variables. > > My problem is that this is not efficient because in every operation (add or > remove a vertex) I need to update all peers, so I need to send lots of > messages to make those updates (see GraphJobRunner#countGlobalVertexCount > method) and I believe this is not correct and scalable. An other problem is > that, even if I update all those variable (with the cost of sending lots of > messages to every peer) those variables will be updated on the next superstep. > > e.g.: > > Peer 1: Peer 2: > Vert_1 Vert_2 > (Total_V = 2) (Total_V = 2) > addVertex() > (Total_V = 3) > getNumberOfV() => 2 > > ------------------------ Sync ------------------------ > > getNumberOfV() => 3 > > > Is there something like global counters or shared memory that it can address > this issue? > > P.S. I have a small feeling that we don't need to track the total amount of > vertices because vertex centered algorithms rarely need total numbers, they > only depend on neighbors (I might be wrong though). > > Thanks, > Anastasis
