RE: GraphX partition problem

2014-05-28 Thread Zhicharevich, Alex
below? Can you advise on how to solve this issue? Thanks, Alex From: Ankur Dave [mailto:ankurd...@gmail.com] Sent: Thursday, May 22, 2014 6:59 PM To: user@spark.apache.orgmailto:user@spark.apache.org Subject: Re: GraphX partition problem The fix will be included in Spark 1.0, but if you just want

Re: GraphX partition problem

2014-05-28 Thread Ankur Dave
I've been trying to reproduce this but I haven't succeeded so far. For example, on the web-Google https://snap.stanford.edu/data/web-Google.htmlgraph, I get the expected results both on v0.9.1-handle-empty-partitions and on master: // Load web-Google and run connected componentsimport

RE: GraphX partition problem

2014-05-26 Thread Zhicharevich, Alex
I’m not sure about 1.2TB, but I can give it a shot. Is there some way to persist intermediate results to disk? Does all the graph has to be in memory? Alex From: Ankur Dave [mailto:ankurd...@gmail.com] Sent: Monday, May 26, 2014 12:23 AM To: user@spark.apache.org Subject: Re: GraphX partition

RE: GraphX partition problem

2014-05-26 Thread Zhicharevich, Alex
Can we do better with Bagel somehow? Control how we store the graph? From: Ankur Dave [mailto:ankurd...@gmail.com] Sent: Monday, May 26, 2014 12:13 PM To: user@spark.apache.org Subject: Re: GraphX partition problem GraphX only performs sequential scans over the edges, so we could in theory

RE: GraphX partition problem

2014-05-25 Thread Zhicharevich, Alex
@spark.apache.org Subject: Re: GraphX partition problem The fix will be included in Spark 1.0, but if you just want to apply the fix to 0.9.1, here's a hotfixed version of 0.9.1 that only includes PR #367: https://github.com/ankurdave/spark/tree/v0.9.1-handle-empty-partitions. You can clone and build

Re: GraphX partition problem

2014-05-25 Thread Ankur Dave
Once the graph is built, edges are stored in parallel primitive arrays, so each edge should only take 20 bytes to store (srcId: Long, dstId: Long, attr: Int). Unfortunately, the current implementation in EdgePartitionBuilder uses an array of Edge objects as an intermediate representation for