below? Can you advise on how to
solve this issue?
Thanks,
Alex
From: Ankur Dave [mailto:ankurd...@gmail.com]
Sent: Thursday, May 22, 2014 6:59 PM
To: user@spark.apache.orgmailto:user@spark.apache.org
Subject: Re: GraphX partition problem
The fix will be included in Spark 1.0, but if you just want
I've been trying to reproduce this but I haven't succeeded so far. For
example, on the web-Google
https://snap.stanford.edu/data/web-Google.htmlgraph, I get the
expected results both on v0.9.1-handle-empty-partitions
and on master:
// Load web-Google and run connected componentsimport
I’m not sure about 1.2TB, but I can give it a shot.
Is there some way to persist intermediate results to disk? Does all the graph
has to be in memory?
Alex
From: Ankur Dave [mailto:ankurd...@gmail.com]
Sent: Monday, May 26, 2014 12:23 AM
To: user@spark.apache.org
Subject: Re: GraphX partition
Can we do better with Bagel somehow? Control how we store the graph?
From: Ankur Dave [mailto:ankurd...@gmail.com]
Sent: Monday, May 26, 2014 12:13 PM
To: user@spark.apache.org
Subject: Re: GraphX partition problem
GraphX only performs sequential scans over the edges, so we could in theory
@spark.apache.org
Subject: Re: GraphX partition problem
The fix will be included in Spark 1.0, but if you just want to apply the fix to
0.9.1, here's a hotfixed version of 0.9.1 that only includes PR #367:
https://github.com/ankurdave/spark/tree/v0.9.1-handle-empty-partitions. You can
clone and build
Once the graph is built, edges are stored in parallel primitive arrays, so
each edge should only take 20 bytes to store (srcId: Long, dstId: Long,
attr: Int). Unfortunately, the current implementation in
EdgePartitionBuilder uses an array of Edge objects as an intermediate
representation for