I have a very difficult problem to debug. Several vertices seem to be 
duplicated -
maybe I am not reading the inputs properly? Here is more info:

- I have three input splits and use three workers. I have written my own 
(part of the zip I sent few days ago). In split one, i have ids mod 3 = 0, then 
ids mod 3 = 1 etc.

I added some extra debug vertex id 875600:

- I checked that the vertex 875600 is read only once, with 8 edges by adding a 
System.out.println debug:
        ::: READ: 875600 ; 8 : [81066, 271870, 272882, 483962, 621946, 723717, 
834555, 845506]

- in the vertex.compute I will write the hostname of the computer and how many 
messsages, and
eedges there are. From here I see that this vertex appear on two different 
hosts because I get 
two types of outputs:

hostA.ml.cmu.edu 875600* => 0.0 / 0.0 msgs=0/6813839/8

hostB.ml.cmu.edu 875600* => -3.4657359027997265 / -3.4657359027997265 

Note that the last string the debug is num-of-messages/num-edges/num-out-edges.

In the hostB, this vertex has no edges, but on host A, it has the correct 8 


Does it matter how I split the vertex-ids?

ps. For next report I will make an Apache account. Too busy now..

Aapo Kyrola
Ph.D. student, http://www.cs.cmu.edu/~akyrola

