so I managed to resolve the issue myself.
On 21 Mar 2012, at 20:30, Benjamin Heitmann wrote:
> A few questions which come to my mind as a sort of checklist:
> * are my assembly instructions in pom.xml and in hadoop-job.xml correct ?
This was the deciding issue. My jar file contained the dependencies as jar
files in the lib dir inside of the job jar.
While (almost) all Google search results for assembling a hadoop job as a jar
suggest that this is the right way to do it,
it seems that Giraph or a dependency introduces some changes to the process in
which the job jar is loaded.
After checking out the giraph-*-jar-with-dependencies.jar (with jar -tf), I saw
that all dependency jars are unpacked in there.
I copied and modified the relevant invocation of the maven assembly plugin to
my project pom.xml and built that jar (with mvn clean assembly:assembly).
Then I submitted that jar to hadoop. Using bin/giraph failed (an error about
not being able to write using the output format.)
However, bypassing bin/giraph and telling hadoop to run my subclass of Tool via
I submitted the changes to pom.xml to the github repo if anybody wants to have
So my problem of not being able to run my giraph job on a hadoop cluster *at
all* is solved for now.
The error which I had when trying bin/giraph was reproducible in the same
environment for the PageRankeBenchmark.
I can file an issue for that later, if somebody else can reproduce that.
In addition, I would strongly suggest making a maven archetype for a simple
I will start a new email thread for that.