so I managed to resolve the issue myself. 

On 21 Mar 2012, at 20:30, Benjamin Heitmann wrote:

> A few questions which come to my mind as a sort of checklist: 
> * are my assembly instructions in pom.xml and in hadoop-job.xml correct ? 

This was the deciding issue. My jar file contained the dependencies as jar 
files in the lib dir inside of the job jar. 
While (almost) all Google search results for assembling a hadoop job as a jar 
suggest that this is the right way to do it, 
it seems that Giraph or a dependency introduces some changes to the process in 
which the job jar is loaded. 

After checking out the giraph-*-jar-with-dependencies.jar (with jar -tf), I saw 
that all dependency jars are unpacked in there.
I copied and modified the relevant invocation of the maven assembly plugin to 
my project pom.xml and built that jar (with mvn clean assembly:assembly). 

Then I submitted that jar to hadoop. Using bin/giraph failed (an error about 
not being able to write using the output format.) 

However, bypassing bin/giraph and telling hadoop to run my subclass of Tool via 
ToolRunner worked. 
I submitted the changes to pom.xml to the github repo if anybody wants to have 
a look. 

So my problem of not being able to run my giraph job on a hadoop cluster *at 
all* is solved for now. 

The error which I had when trying bin/giraph was reproducible in the same 
environment for the PageRankeBenchmark. 
I can file an issue for that later, if somebody else can reproduce that. 

In addition, I would strongly suggest making a maven archetype for a simple 
giraph job. 

I will start a new email thread for that. 

cheers, Benjamin. 

Reply via email to