Glad to hear you figured it out! Keep us informed on how your
experiments are going and what we can do to help.
Avery
On 1/30/12 7:50 AM, David Garcia wrote:
Thx again Avery for your prompt responses. The problem you suggested
didn't turn out to be the actual problem. But you lead me in the
right direction. It turns out that all my Vertex instances were
unique (I.e. New vertices were being created with getCurrentVertex()
). . .however, SequenceFileRecordReader preserves singletons for its
getCurrentKey() and getCurrentValue() methods. So every time you call
nextKey/nextValue on the record reader, these singletons get updated.
This was a real pain to figure out. Thx again for all your help!!
-David
From: David Garcia <[email protected]
<mailto:[email protected]>>
Reply-To: "[email protected]
<mailto:[email protected]>"
<[email protected]
<mailto:[email protected]>>
Date: Mon, 30 Jan 2012 08:48:21 -0600
To: "[email protected]
<mailto:[email protected]>"
<[email protected]
<mailto:[email protected]>>
Subject: Re: Vertex exists error when processing input splits for
Sequence file
Ok, that's a good point. My getCurrentVertext() method looks like this:
@Override
public BasicVertex<I, V, E, M> getCurrentVertex() throws
IOException, InterruptedException {
BasicVertex<I,V,E,M> vertex =
BspUtils.createVertex(getContext().getConfiguration());
I vertexID = (I)getRecordReader().getCurrentKey();
V vertexValue = (V)getRecordReader().getCurrentValue();
try{
vertex.initialize(vertexID,vertexValue,null,null);
}
catch(Exception e){
e.printStackTrace();
}
return vertex;
}
Perhaps BspUtils is reusing it?
From: Avery Ching <[email protected] <mailto:[email protected]>>
Reply-To: "[email protected]
<mailto:[email protected]>"
<[email protected]
<mailto:[email protected]>>
Date: Mon, 30 Jan 2012 02:37:09 -0600
To: "[email protected]
<mailto:[email protected]>"
<[email protected]
<mailto:[email protected]>>
Subject: Re: Vertex exists error when processing input splits for
Sequence file
In your implementation of VertexReader#getCurrentVertex(), are you
providing a new BasicVertex object each time (after nextVertex() is
called)? If you are reusing the same BasicVertex object you could get
the problems like the ones you describe.
Avery
On 1/30/12 12:24 AM, David Garcia wrote:
Thx for the response Avery. . .unfortunately, I can confirm that I do
not have duplicates in my data. I have narrowed the problem to the
following method:
private VertexEdgeCount readVerticesFromInputSplit(
InputSplit inputSplit) throws IOException,
InterruptedException {
.
.
.
while (vertexReader.nextVertex()) {
BasicVertex<I, V, E, M> readerVertex =
vertexReader.getCurrentVertex();
.
.
.
When the .nextVertex() method is called, it automatically mutates
every HashMap in a Partition in the InputSplitCache. The nature of
the mutation is to convert every Vertex (in the respective partition)
to next vertex resulting from .nextVertex(). (Again, note that the
underlying RecordReader is a SequenceFileRecordReader). For example,
if I have the following inputSplitCache:
inputSplitCache
[0]
Key -> BasicPartitionOwner. . .
Value - > Partition
Conf -> Configuration . . .
partitionID = 0
vertexMap
[0] -> 00kK4. . .
I have one vertex in my partition. . .assuming that the next vertex
ID is mM424, after vertexReader.nextVertex() is called, the data
structure changes to this. . .
inputSplitCache
[0]
Key -> BasicPartitionOwner. . .
Value - > Partition
Conf -> Configuration . . .
partitionID = 0
vertexMap
[0] -> mM424. . .
After partition.putVertex(. . .) is called, another identical vertex
is added.
inputSplitCache
[0]
Key -> BasicPartitionOwner. . .
Value - > Partition
Conf -> Configuration . . .
partitionID = 0
vertexMap
[0] -> mM424. . .
[1] -> mM424. . .
This leads to the error in my previous email. . .All the vertices in
my graph end up with the data of my final Vertex, as this pattern
suggests. It's almost as if some weird aspectJ is intercepting the
call to .nextVertex(). I'm happy to brandish my code. I feel it's
fairly simple. It's just a sequenceFile input format and some
trivial vertex class.
-Dave
From: Avery Ching <[email protected] <mailto:[email protected]>>
Reply-To: "[email protected]
<mailto:[email protected]>"
<[email protected]
<mailto:[email protected]>>
Date: Mon, 30 Jan 2012 01:28:12 -0600
To: "[email protected]
<mailto:[email protected]>"
<[email protected]
<mailto:[email protected]>>
Subject: Re: Vertex exists error when processing input splits for
Sequence file
Hi David,
So from the errors, it appears that your input has multiple vertices
with the same vertex id. Currently we throw an exception to prevent
this from happening as it is typically not what you want. You
probably want to watch the vertices being processed from the vertex
input format and see why you are getting duplicates. It's likely to
be either an error with the data actually have vertices with the same
vertex id or an error with your custom vertex input format.
To help debug, you might want to add some logging to your record
reader and print the vertex ids or you can add some logging to where
that code is called in BspServiceWorker#readVerticesFromInputSplit().
Hope that helps,
Avery
On 1/29/12 8:13 PM, David Garcia wrote:
Hello, I get this error when I try run my job:
2012-01-29 21:50:18,494
INFO or
g.apache.giraph.graph.BspServiceWorker: reserveInputSplit: reservedPath = null,
1 of 1 InputSplits are finished.
2012-01-29 21:50:18,494 INFO org.apache.giraph.graph.BspServiceWorker: setup:
Finally loaded a total of (v=0, e=0)
2012-01-29 21:50:18,764 INFO org.apache.giraph.graph.BspService: process:
inputSplitsAllDoneChanged (all vertices sent from input splits)
2012-01-29 21:50:18,766 ERROR org.apache.giraph.graph.GraphMapper: setup:
Caught exception just before end of setup
java.lang.IllegalStateException: moveVerticesToWorker: Vertex
Vertex(id=zzYNBgKt2LF6ClLA2eMBzuN7SkA.,value=org.apache.hadoop.io.MapWritable@5ce8787a,#edges=0)
already exists!
at
org.apache.giraph.graph.BspServiceWorker.movePartitionsToWorker(BspServiceWorker.java:1389)
at
org.apache.giraph.graph.BspServiceWorker.setup(BspServiceWorker.java:624)
at org.apache.giraph.graph.GraphMapper.setup(GraphMapper.java:458)
.
.
.
I'm not sure where the start debugging. . .BspServiceWorker is hella big. All
input is welcome. As I mentioned, I'm processing a sequenceFile that has Text
keys and MapWritable Values. I would like the vertices to have Text indices
and MapWritable values. (I'm not inserting any edges for the time being. . .I
just want to see the file get split properly). I have implemented custom input
formats and record readers. Thx
-Dave