Re: Vertex exists error when processing input splits for Sequence file

Avery Ching Mon, 30 Jan 2012 00:37:35 -0800

In your implementation of VertexReader#getCurrentVertex(), are youproviding a new BasicVertex object each time (after nextVertex() iscalled)? If you are reusing the same BasicVertex object you could getthe problems like the ones you describe.


Avery


On 1/30/12 12:24 AM, David Garcia wrote:

Thx for the response Avery. . .unfortunately, I can confirm that I donot have duplicates in my data. I have narrowed the problem to thefollowing method:
private VertexEdgeCount readVerticesFromInputSplit(
InputSplit inputSplit) throws IOException,InterruptedException {
.
.
.
while (vertexReader.nextVertex()) {
            BasicVertex<I, V, E, M> readerVertex =
                vertexReader.getCurrentVertex();
.
.
.
When the .nextVertex() method is called, it automatically mutatesevery HashMap in a Partition in the InputSplitCache. The nature ofthe mutation is to convert every Vertex (in the respective partition)to next vertex resulting from .nextVertex(). (Again, note that theunderlying RecordReader is a SequenceFileRecordReader). For example,if I have the following inputSplitCache:
inputSplitCache
[0]
Key -> BasicPartitionOwner. . .
Value - > Partition
Conf -> Configuration . . .
partitionID = 0
vertexMap
[0] -> 00kK4. . .
I have one vertex in my partition. . .assuming that the next vertex IDis mM424, after vertexReader.nextVertex() is called, the datastructure changes to this. . .
inputSplitCache
[0]
Key -> BasicPartitionOwner. . .
Value - > Partition
Conf -> Configuration . . .
partitionID = 0
vertexMap
[0] -> mM424. . .
After partition.putVertex(. . .) is called, another identical vertexis added.
inputSplitCache
[0]
Key -> BasicPartitionOwner. . .
Value - > Partition
Conf -> Configuration . . .
partitionID = 0
vertexMap
[0] -> mM424. . .
[1] -> mM424. . .
This leads to the error in my previous email. . .All the vertices inmy graph end up with the data of my final Vertex, as this patternsuggests. It's almost as if some weird aspectJ is intercepting thecall to .nextVertex(). I'm happy to brandish my code. I feel it'sfairly simple. It's just a sequenceFile input format and some trivialvertex class.
-Dave

From: Avery Ching <[email protected] <mailto:[email protected]>>
Reply-To: "[email protected]<mailto:[email protected]>"<[email protected]<mailto:[email protected]>>
Date: Mon, 30 Jan 2012 01:28:12 -0600
To: "[email protected]<mailto:[email protected]>"<[email protected]<mailto:[email protected]>>Subject: Re: Vertex exists error when processing input splits forSequence file
Hi David,
So from the errors, it appears that your input has multiple verticeswith the same vertex id. Currently we throw an exception to preventthis from happening as it is typically not what you want. Youprobably want to watch the vertices being processed from the vertexinput format and see why you are getting duplicates. It's likely tobe either an error with the data actually have vertices with the samevertex id or an error with your custom vertex input format.
To help debug, you might want to add some logging to your recordreader and print the vertex ids or you can add some logging to wherethat code is called in BspServiceWorker#readVerticesFromInputSplit().
Hope that helps,

Avery

On 1/29/12 8:13 PM, David Garcia wrote:
Hello, I get this error when I try run my job:
2012-01-29 21:50:18,494 INFO or

g.apache.giraph.graph.BspServiceWorker: reserveInputSplit: reservedPath = null, 
1 of 1 InputSplits are finished.
2012-01-29 21:50:18,494 INFO org.apache.giraph.graph.BspServiceWorker: setup: 
Finally loaded a total of (v=0, e=0)
2012-01-29 21:50:18,764 INFO org.apache.giraph.graph.BspService: process: 
inputSplitsAllDoneChanged (all vertices sent from input splits)
2012-01-29 21:50:18,766 ERROR org.apache.giraph.graph.GraphMapper: setup: 
Caught exception just before end of setup
java.lang.IllegalStateException: moveVerticesToWorker: Vertex 
Vertex(id=zzYNBgKt2LF6ClLA2eMBzuN7SkA.,value=org.apache.hadoop.io.MapWritable@5ce8787a,#edges=0)
 already exists!
        at 
org.apache.giraph.graph.BspServiceWorker.movePartitionsToWorker(BspServiceWorker.java:1389)
        at 
org.apache.giraph.graph.BspServiceWorker.setup(BspServiceWorker.java:624)
        at org.apache.giraph.graph.GraphMapper.setup(GraphMapper.java:458)
.
.
.
I'm not sure where the start debugging. . .BspServiceWorker is hella big.  All 
input is welcome.  As I mentioned, I'm processing a sequenceFile that has Text 
keys and MapWritable Values.  I would like the vertices to have Text indices 
and MapWritable values.  (I'm not inserting any edges for the time being. . .I 
just want to see the file get split properly).  I have implemented custom input 
formats and record readers.  Thx
-Dave

Re: Vertex exists error when processing input splits for Sequence file

Reply via email to