Re: Dynamic vertices and hama counters

2013-07-18 Thread Chia-Hung Lin
Sorry my bad. Only focused on counter stuff. Didn't pay attention to
Vertex related issue. Thought that just want to share counter value
between peers. In that case persisting counter value to zk shouldn't
be a problem, and won't incur overhead. But if the case is not about
counter, please just ignore my previous post.


On 17 July 2013 06:59, Edward J. Yoon edwardy...@apache.org wrote:
 You guys seems totally misunderstood what I am saying.

 Every BSP processor accesses to ZK's counter concurrently? Do you
 think it is possible to determine the current total number of vertices
 in every step without barrier synchronization?

 As I mentioned before, there is already additional barrier
 synchronization steps for aggregating and broadcasting global updated
 vertex count. You can use this steps without *no additional barrier
 synchronization*.

 On Wed, Jul 17, 2013 at 5:01 AM, andronat_asf andronat_...@hotmail.com 
 wrote:
 Thank you everyone,

 +1 for Tommaso, I will see what I can do about that :)

 I also believe that ZK is very similar sync() mechanism that Edward is 
 saying, but if we need to sync more info we might need ZK.

 Thanks again,
 Anastasis

 On 15 Ιουλ 2013, at 5:55 μ.μ., Edward J. Yoon edwardy...@apache.org wrote:

 andronat_asf,

 To aggregate and broadcast the global count of updated vertices, we
 calls sync() twice. See the doAggregationUpdates() method in
 GraphJobRunner. You can solve your problem the same way, and there
 will be no additional cost.

 Use of Zookeeper is not bad idea. But IMO, it's not much different
 with sync() mechanism.

 On Mon, Jul 15, 2013 at 10:05 PM, Chia-Hung Lin cli...@googlemail.com 
 wrote:
 +1 for Tommaso's solution.

 If not every algorithm needs counter service, having an interface with
 different implementations (in-memory, zk, etc.) should reduce the side
 effect.


 On 15 July 2013 15:51, Tommaso Teofili tommaso.teof...@gmail.com wrote:
 what about introducing a proper API for counting vertices, something like
 an interface VertexCounter with 2-3 implementations like
 InMemoryVertexCounter (basically the current one), a
 DistributedVertexCounter to implement the scenario where we use a separate
 BSP superstep to count them and a ZKVertexCounter which handles vertices
 counts as per Chian-Hung's suggestion.

 Also we may introduce something like a configuration variable to define if
 all the vertices are needed or just the neighbors (and/or some other
 strategy).

 My 2 cents,
 Tommaso

 2013/7/14 Chia-Hung Lin cli...@googlemail.com

 Just my personal viewpoint. For small size of global information,
 considering to store the state in ZooKeeper might be a reasonable
 solution.

 On 13 July 2013 21:28, andronat_asf andronat_...@hotmail.com wrote:
 Hello everyone,

 I'm working on HAMA-767 and I have some concerns on counters and
 scalability. Currently, every peer has a set of vertices and a variable
 that is keeping the total number of vertices through all peers. In my 
 case,
 I'm trying to add and remove vertices during the runtime of a job, which
 means that I have to update all those variables.

 My problem is that this is not efficient because in every operation (add
 or remove a vertex) I need to update all peers, so I need to send lots of
 messages to make those updates (see GraphJobRunner#countGlobalVertexCount
 method) and I believe this is not correct and scalable. An other problem 
 is
 that, even if I update all those variable (with the cost of sending lots 
 of
 messages to every peer) those variables will be updated on the next
 superstep.

 e.g.:

 Peer 1:Peer 2:
  Vert_1  Vert_2
 (Total_V = 2)  (Total_V = 2)
 addVertex()
 (Total_V = 3)
 getNumberOfV() = 2

  Sync 

 getNumberOfV() = 3


 Is there something like global counters or shared memory that it can
 address this issue?

 P.S. I have a small feeling that we don't need to track the total amount
 of vertices because vertex centered algorithms rarely need total numbers,
 they only depend on neighbors (I might be wrong though).

 Thanks,
 Anastasis




 --
 Best Regards, Edward J. Yoon
 @eddieyoon





 --
 Best Regards, Edward J. Yoon
 @eddieyoon


Re: Dynamic vertices and hama counters

2013-07-15 Thread Tommaso Teofili
what about introducing a proper API for counting vertices, something like
an interface VertexCounter with 2-3 implementations like
InMemoryVertexCounter (basically the current one), a
DistributedVertexCounter to implement the scenario where we use a separate
BSP superstep to count them and a ZKVertexCounter which handles vertices
counts as per Chian-Hung's suggestion.

Also we may introduce something like a configuration variable to define if
all the vertices are needed or just the neighbors (and/or some other
strategy).

My 2 cents,
Tommaso

2013/7/14 Chia-Hung Lin cli...@googlemail.com

 Just my personal viewpoint. For small size of global information,
 considering to store the state in ZooKeeper might be a reasonable
 solution.

 On 13 July 2013 21:28, andronat_asf andronat_...@hotmail.com wrote:
  Hello everyone,
 
  I'm working on HAMA-767 and I have some concerns on counters and
 scalability. Currently, every peer has a set of vertices and a variable
 that is keeping the total number of vertices through all peers. In my case,
 I'm trying to add and remove vertices during the runtime of a job, which
 means that I have to update all those variables.
 
  My problem is that this is not efficient because in every operation (add
 or remove a vertex) I need to update all peers, so I need to send lots of
 messages to make those updates (see GraphJobRunner#countGlobalVertexCount
 method) and I believe this is not correct and scalable. An other problem is
 that, even if I update all those variable (with the cost of sending lots of
 messages to every peer) those variables will be updated on the next
 superstep.
 
  e.g.:
 
  Peer 1:Peer 2:
Vert_1  Vert_2
  (Total_V = 2)  (Total_V = 2)
  addVertex()
  (Total_V = 3)
   getNumberOfV() = 2
 
   Sync 
 
   getNumberOfV() = 3
 
 
  Is there something like global counters or shared memory that it can
 address this issue?
 
  P.S. I have a small feeling that we don't need to track the total amount
 of vertices because vertex centered algorithms rarely need total numbers,
 they only depend on neighbors (I might be wrong though).
 
  Thanks,
  Anastasis



Re: Dynamic vertices and hama counters

2013-07-15 Thread Chia-Hung Lin
+1 for Tommaso's solution.

If not every algorithm needs counter service, having an interface with
different implementations (in-memory, zk, etc.) should reduce the side
effect.


On 15 July 2013 15:51, Tommaso Teofili tommaso.teof...@gmail.com wrote:
 what about introducing a proper API for counting vertices, something like
 an interface VertexCounter with 2-3 implementations like
 InMemoryVertexCounter (basically the current one), a
 DistributedVertexCounter to implement the scenario where we use a separate
 BSP superstep to count them and a ZKVertexCounter which handles vertices
 counts as per Chian-Hung's suggestion.

 Also we may introduce something like a configuration variable to define if
 all the vertices are needed or just the neighbors (and/or some other
 strategy).

 My 2 cents,
 Tommaso

 2013/7/14 Chia-Hung Lin cli...@googlemail.com

 Just my personal viewpoint. For small size of global information,
 considering to store the state in ZooKeeper might be a reasonable
 solution.

 On 13 July 2013 21:28, andronat_asf andronat_...@hotmail.com wrote:
  Hello everyone,
 
  I'm working on HAMA-767 and I have some concerns on counters and
 scalability. Currently, every peer has a set of vertices and a variable
 that is keeping the total number of vertices through all peers. In my case,
 I'm trying to add and remove vertices during the runtime of a job, which
 means that I have to update all those variables.
 
  My problem is that this is not efficient because in every operation (add
 or remove a vertex) I need to update all peers, so I need to send lots of
 messages to make those updates (see GraphJobRunner#countGlobalVertexCount
 method) and I believe this is not correct and scalable. An other problem is
 that, even if I update all those variable (with the cost of sending lots of
 messages to every peer) those variables will be updated on the next
 superstep.
 
  e.g.:
 
  Peer 1:Peer 2:
Vert_1  Vert_2
  (Total_V = 2)  (Total_V = 2)
  addVertex()
  (Total_V = 3)
   getNumberOfV() = 2
 
   Sync 
 
   getNumberOfV() = 3
 
 
  Is there something like global counters or shared memory that it can
 address this issue?
 
  P.S. I have a small feeling that we don't need to track the total amount
 of vertices because vertex centered algorithms rarely need total numbers,
 they only depend on neighbors (I might be wrong though).
 
  Thanks,
  Anastasis



Re: Dynamic vertices and hama counters

2013-07-15 Thread Edward J. Yoon
andronat_asf,

To aggregate and broadcast the global count of updated vertices, we
calls sync() twice. See the doAggregationUpdates() method in
GraphJobRunner. You can solve your problem the same way, and there
will be no additional cost.

Use of Zookeeper is not bad idea. But IMO, it's not much different
with sync() mechanism.

On Mon, Jul 15, 2013 at 10:05 PM, Chia-Hung Lin cli...@googlemail.com wrote:
 +1 for Tommaso's solution.

 If not every algorithm needs counter service, having an interface with
 different implementations (in-memory, zk, etc.) should reduce the side
 effect.


 On 15 July 2013 15:51, Tommaso Teofili tommaso.teof...@gmail.com wrote:
 what about introducing a proper API for counting vertices, something like
 an interface VertexCounter with 2-3 implementations like
 InMemoryVertexCounter (basically the current one), a
 DistributedVertexCounter to implement the scenario where we use a separate
 BSP superstep to count them and a ZKVertexCounter which handles vertices
 counts as per Chian-Hung's suggestion.

 Also we may introduce something like a configuration variable to define if
 all the vertices are needed or just the neighbors (and/or some other
 strategy).

 My 2 cents,
 Tommaso

 2013/7/14 Chia-Hung Lin cli...@googlemail.com

 Just my personal viewpoint. For small size of global information,
 considering to store the state in ZooKeeper might be a reasonable
 solution.

 On 13 July 2013 21:28, andronat_asf andronat_...@hotmail.com wrote:
  Hello everyone,
 
  I'm working on HAMA-767 and I have some concerns on counters and
 scalability. Currently, every peer has a set of vertices and a variable
 that is keeping the total number of vertices through all peers. In my case,
 I'm trying to add and remove vertices during the runtime of a job, which
 means that I have to update all those variables.
 
  My problem is that this is not efficient because in every operation (add
 or remove a vertex) I need to update all peers, so I need to send lots of
 messages to make those updates (see GraphJobRunner#countGlobalVertexCount
 method) and I believe this is not correct and scalable. An other problem is
 that, even if I update all those variable (with the cost of sending lots of
 messages to every peer) those variables will be updated on the next
 superstep.
 
  e.g.:
 
  Peer 1:Peer 2:
Vert_1  Vert_2
  (Total_V = 2)  (Total_V = 2)
  addVertex()
  (Total_V = 3)
   getNumberOfV() = 2
 
   Sync 
 
   getNumberOfV() = 3
 
 
  Is there something like global counters or shared memory that it can
 address this issue?
 
  P.S. I have a small feeling that we don't need to track the total amount
 of vertices because vertex centered algorithms rarely need total numbers,
 they only depend on neighbors (I might be wrong though).
 
  Thanks,
  Anastasis




-- 
Best Regards, Edward J. Yoon
@eddieyoon