Re: Duplicate vertices?

Avery Ching Sat, 01 Oct 2011 16:15:05 -0700

I mean that

input split 0 should have 0, 1, 7
input split 1 should have 13, 20
input split 2 should have 24, 87, 108


I think the more clear definition should be something like

1) Any vertex read by the VertexReader should have a vertex id greaterthan its predecessor vertex read.2) If an input split A has a vertex with a vertex id < any vertex id ininput split, then all vertex ids in input split A must be < all vertexids in input split B.


Hope that helps.  Let me know if you have more questions.

Avery

On 10/1/11 3:59 PM, Aapo Kyrola wrote:

Hi Avery,

can you elaborate bit?

So I load vertices in order, but with skipping:

so partition 0 will read vertex 0, vertex 3, 6, …
partition 1 will read vertex 1, vertex,4, …

Do you mean the vertices must be consequtive in the
split?

Aapo



On Oct 1, 2011, at 6:57 PM, Avery Ching wrote:
Unfortunately, someone (probably me), needs to make a wiki on thisissue. Currently, we require that your vertices are globally sortedby vertex id and that the vertices read in each input split are inorder by vertex id. That probably explains the weirdness you areseeing. This issue is being addressed (albeit slowly because of newjob) in https://issues.apache.org/jira/browse/GIRAPH-11. The issueis also described a bit more fully there.
Avery

On 10/1/11 12:44 PM, Aapo Kyrola wrote:
Hi,
I have a very difficult problem to debug. Several vertices seem tobe duplicated -
maybe I am not reading the inputs properly? Here is more info:
- I have three input splits and use three workers. I have written myown input-dataformat(part of the zip I sent few days ago). In split one, i have ids mod3 = 0, then ids mod 3 = 1 etc.
I added some extra debug vertex id 875600:
- I checked that the vertex 875600 is read only once, with 8 edgesby adding a System.out.println debug:::: READ: 875600 ; 8 : [81066, 271870, 272882, 483962, 621946,723717, 834555, 845506]
- in the vertex.compute I will write the hostname of the computerand how many messsages, andeedges there are. From here I see that this vertex appear on twodifferent hosts because I get
two types of outputs:
hostA.ml.cmu.edu <http://hostA.ml.cmu.edu/> 875600* => 0.0 / 0.0msgs=0/6813839/8
hostB.ml.cmu.edu <http://hostB.ml.cmu.edu/> 875600* =>-3.4657359027997265 / -3.4657359027997265 msgs=5/6813839/0
Note that the last string the debug isnum-of-messages/num-edges/num-out-edges.
In the hostB, this vertex has no edges, but on host A, it has thecorrect 8 edges.
--

Does it matter how I split the vertex-ids?



ps. For next report I will make an Apache account. Too busy now..


Aapo Kyrola
Ph.D. student, http://www.cs.cmu.edu/~akyrola<http://www.cs.cmu.edu/%7Eakyrola>
Aapo Kyrola
Ph.D. student, http://www.cs.cmu.edu/~akyrola<http://www.cs.cmu.edu/%7Eakyrola>

Re: Duplicate vertices?

Reply via email to