I ran the job without the caching trick in the inputformat and still run
into the freeze.
On 02/15/2014 08:52 PM, Claudio Martella wrote:
I don't know, maybe I'm missing something, or there's a bug there as well.
I do agree that this is spooky. Armando has tested it also with the
WattsStrogatzInputformat, that creates another type of graph. For what I
understand, this should not happen due to the topology. I think we should
just try to replicate this behavior, hopefully without a very large graph
that makes debugging difficult.
On Sat, Feb 15, 2014 at 8:42 PM, Sebastian Schelter <s...@apache.org> wrote:
I copied the caching from o.a.g.io.formats.IntIntNullTextInputFormat and
it worked well during my tests (it did not happen that all vertices had the
same id).
I'm happy to remove this and rerun the tests. It's strange that
out-of-core works with PageRank on a generated graph, but not with
Hyperball on the twitter graph. The generated graph has a uniform degree
distribution, while the twitter graph's degree distribution is heavily
skewed, can that have an influence on the behavior of ooc?
Best,
Sebastian
On 02/15/2014 08:32 PM, Claudio Martella wrote:
Sebastian, I had a look at your vertexinputformat. I think there might be
a
bug. Why are you caching/reusing the id? This way every vertex parsed by
the vertexreader will share the same ID object, and hence have the same
ID.
I think this is broken. you should instantiate a new ID object in the
preprocessLine.
Can you try like that?
On Thu, Feb 13, 2014 at 9:50 PM, Sebastian Schelter <s...@apache.org>
wrote:
Hi Armando,
I uploaded my test code to github at:
https://github.com/sscdotopen/giraph/tree/hyperball64-ooc
I'm working on an algorithm to estimate the neighborhood function of the
graph (similar to [1]). I'm running this on the transposed adjacency
matrix
of a snapshot of the twitter follower graph [2]. For this graph
out-of-core
is not necessary, but I would like to run my algorithm on another larger
graph that doesn't fit into the aggregated main memory of the cluster
anymore.
I think for testing purposes, you can run it on any large graph in
adjacency form.
Our cluster consists of 25 machines with 32GB ram, 8 cores and 4 disks
per
machine. I use the following options to run the algorithm:
hadoop jar giraph-examples-1.1.0-SNAPSHOT-for-hadoop-1.2.1-jar-
with-dependencies.jar
org.apache.giraph.GiraphRunner
org.apache.giraph.examples.hyperball.HyperBall
--vertexInputFormat org.apache.giraph.examples.hyperball.
HyperBallTextInputFormat
--vertexInputPath hdfs:///ssc/twitter-negative/
--vertexOutputFormat org.apache.giraph.io.formats.
IdWithValueTextOutputFormat
--outputPath hdfs:///ssc/tmp-123/
--combiner org.apache.giraph.comm.messages.HyperLogLogCombiner
--outEdges org.apache.giraph.edge.LongNullArrayEdges
--workers 24
--customArguments
giraph.oneToAllMsgSending=true,
giraph.isStaticGraph=true,
giraph.numComputeThreads=15,
giraph.numInputThreads=15,
giraph.numOutputThreads=15,
giraph.maxNumberOfSupersteps=30,
giraph.useOutOfCoreGraph=true,
giraph.maxPartitionsInMemory=20
Best,
Sebastian
[1] http://arxiv.org/abs/1308.2144
[2] http://konect.uni-koblenz.de/networks/twitter_mpi
On 02/12/2014 04:21 PM, Armando Miraglia wrote:
Hi Sebastian,
On Wed, Feb 12, 2014 at 02:59:20PM +0100, Sebastian Schelter wrote:
No. Should I have done that?
could you please provide me with the test you have done together with
the variables that you have set during for the computation? This would
help me a lot.
Cheers,
Armando