Thx for the response Avery. . .unfortunately, I can confirm that I do not have 
duplicates in my data.  I have narrowed the problem to the following method:

private VertexEdgeCount readVerticesFromInputSplit(
            InputSplit inputSplit) throws IOException, InterruptedException {
.
.
.
while (vertexReader.nextVertex()) {
            BasicVertex<I, V, E, M> readerVertex =
                vertexReader.getCurrentVertex();
.
.
.
When the .nextVertex() method is called, it automatically mutates every HashMap 
in a Partition in the InputSplitCache.  The nature of the mutation is to 
convert every Vertex (in the respective partition) to next vertex resulting 
from .nextVertex().  (Again, note that the underlying RecordReader is a 
SequenceFileRecordReader).  For example, if I have the following 
inputSplitCache:

inputSplitCache
[0]
Key -> BasicPartitionOwner. . .
Value - > Partition
Conf -> Configuration . . .
partitionID = 0
vertexMap
[0] -> 00kK4. . .

I have one vertex in my partition. . .assuming that the next vertex ID is 
mM424, after vertexReader.nextVertex() is called, the data structure changes to 
this. . .

inputSplitCache
[0]
Key -> BasicPartitionOwner. . .
Value - > Partition
Conf -> Configuration . . .
partitionID = 0
vertexMap
[0] -> mM424. . .

After partition.putVertex(. . .) is called, another identical vertex is added.

inputSplitCache
[0]
Key -> BasicPartitionOwner. . .
Value - > Partition
Conf -> Configuration . . .
partitionID = 0
vertexMap
[0] -> mM424. . .
[1] -> mM424. . .

This leads to the error in my previous email. . .All the vertices in my graph 
end up with the data of my final Vertex, as this pattern suggests.  It's almost 
as if some weird aspectJ is intercepting the call to .nextVertex().  I'm happy 
to brandish my code.  I feel it's fairly simple.  It's just a sequenceFile 
input format and some trivial vertex class.

-Dave

From: Avery Ching <ach...@apache.org<mailto:ach...@apache.org>>
Reply-To: 
"giraph-user@incubator.apache.org<mailto:giraph-user@incubator.apache.org>" 
<giraph-user@incubator.apache.org<mailto:giraph-user@incubator.apache.org>>
Date: Mon, 30 Jan 2012 01:28:12 -0600
To: "giraph-user@incubator.apache.org<mailto:giraph-user@incubator.apache.org>" 
<giraph-user@incubator.apache.org<mailto:giraph-user@incubator.apache.org>>
Subject: Re: Vertex exists error when processing input splits for Sequence file

Hi David,

So from the errors, it appears that your input has multiple vertices with the 
same vertex id.  Currently we throw an exception to prevent this from happening 
as it is typically not what you want.  You probably want to watch the vertices 
being processed from the vertex input format and see why you are getting 
duplicates.  It's likely to be either an error with the data actually have 
vertices with the same vertex id or an error with your custom vertex input 
format.

To help debug, you might want to add some logging to your record reader and 
print the vertex ids or you can add some logging to where that code is called 
in BspServiceWorker#readVerticesFromInputSplit().

Hope that helps,

Avery

On 1/29/12 8:13 PM, David Garcia wrote:


Hello, I get this error when I try run my job:

2012-01-29 21:50:18,494 INFO or

g.apache.giraph.graph.BspServiceWorker: reserveInputSplit: reservedPath = null, 
1 of 1 InputSplits are finished.
2012-01-29 21:50:18,494 INFO org.apache.giraph.graph.BspServiceWorker: setup: 
Finally loaded a total of (v=0, e=0)
2012-01-29 21:50:18,764 INFO org.apache.giraph.graph.BspService: process: 
inputSplitsAllDoneChanged (all vertices sent from input splits)
2012-01-29 21:50:18,766 ERROR org.apache.giraph.graph.GraphMapper: setup: 
Caught exception just before end of setup

java.lang.IllegalStateException: moveVerticesToWorker: Vertex 
Vertex(id=zzYNBgKt2LF6ClLA2eMBzuN7SkA.,value=org.apache.hadoop.io.MapWritable@5ce8787a,#edges=0)
 already exists!
        at 
org.apache.giraph.graph.BspServiceWorker.movePartitionsToWorker(BspServiceWorker.java:1389)
        at 
org.apache.giraph.graph.BspServiceWorker.setup(BspServiceWorker.java:624)
        at org.apache.giraph.graph.GraphMapper.setup(GraphMapper.java:458)

.

.

.

I'm not sure where the start debugging. . .BspServiceWorker is hella big.  All 
input is welcome.  As I mentioned, I'm processing a sequenceFile that has Text 
keys and MapWritable Values.  I would like the vertices to have Text indices 
and MapWritable values.  (I'm not inserting any edges for the time being. . .I 
just want to see the file get split properly).  I have implemented custom input 
formats and record readers.  Thx

-Dave

Reply via email to