what about introducing a proper API for counting vertices, something like an interface VertexCounter with 2-3 implementations like InMemoryVertexCounter (basically the current one), a DistributedVertexCounter to implement the scenario where we use a separate BSP superstep to count them and a ZKVertexCounter which handles vertices counts as per Chian-Hung's suggestion.
Also we may introduce something like a configuration variable to define if all the vertices are needed or just the neighbors (and/or some other strategy). My 2 cents, Tommaso 2013/7/14 Chia-Hung Lin <[email protected]> > Just my personal viewpoint. For small size of global information, > considering to store the state in ZooKeeper might be a reasonable > solution. > > On 13 July 2013 21:28, andronat_asf <[email protected]> wrote: > > Hello everyone, > > > > I'm working on HAMA-767 and I have some concerns on counters and > scalability. Currently, every peer has a set of vertices and a variable > that is keeping the total number of vertices through all peers. In my case, > I'm trying to add and remove vertices during the runtime of a job, which > means that I have to update all those variables. > > > > My problem is that this is not efficient because in every operation (add > or remove a vertex) I need to update all peers, so I need to send lots of > messages to make those updates (see GraphJobRunner#countGlobalVertexCount > method) and I believe this is not correct and scalable. An other problem is > that, even if I update all those variable (with the cost of sending lots of > messages to every peer) those variables will be updated on the next > superstep. > > > > e.g.: > > > > Peer 1: Peer 2: > > Vert_1 Vert_2 > > (Total_V = 2) (Total_V = 2) > > addVertex() > > (Total_V = 3) > > getNumberOfV() => 2 > > > > ------------------------ Sync ------------------------ > > > > getNumberOfV() => 3 > > > > > > Is there something like global counters or shared memory that it can > address this issue? > > > > P.S. I have a small feeling that we don't need to track the total amount > of vertices because vertex centered algorithms rarely need total numbers, > they only depend on neighbors (I might be wrong though). > > > > Thanks, > > Anastasis >
