Re: Pregel runs slower and slower when each Pregel has data dependency

2015-06-05 Thread dash
Hi Heather, Please check this issue https://issues.apache.org/jira/browse/SPARK-4672. I think you can solve this problem by checkpointing your data every several iterations. Hope that helps. Best regards, Baoxu(Dash) Shi Computer Science and Engineering Department University of Notre Dame

New combination-like RDD based on two RDDs

2015-02-04 Thread dash
Hey Spark gurus! Sorry for the confusing title. I do not know the exactly description of my problem, if you know please tell me so I can change it :-) Say I have two RDDs right now, and they are val rdd1 = sc.parallelize(List((1,(3)), (2,(5)), (3,(6 val rdd2 = sc.parallelize(List((2,(1)),

Re: New combination-like RDD based on two RDDs

2015-02-04 Thread dash
Problem solved. A simple join will do the work val prefix = new PairRDDFunctions[Int, Set[Int]](sc.parallelize(List((9, Set(4)), (1,Set(3)), (2,Set(5)), (2,Set(4) val suffix = sc.parallelize(List((1, Set(1)), (2, Set(6)), (2, Set(5)), (2, Set(7

Worker can not find custom KryoRegistrator

2014-07-02 Thread dash
Hi, I'm using Spark 1.1.0 standalone with 5 workers and 1 driver, and Kryo settings are When I submit this job, the driver works fine but workers will throw ClassNotFoundException saying they can not found HDTMKryoRegistrator. Any idea about this problem? I googled this but there is only one

Re: Worker can not find custom KryoRegistrator

2014-07-02 Thread Baoxu Shi(Dash)
Don’t know why the setting does not appear in the last mail: .set(spark.serializer, org.apache.spark.serializer.KryoSerializer) .set(spark.kryo.registrator, new HDTMKryoRegistrator().getClass.getName) On Jul 2, 2014, at 1:03 PM, dash b...@nd.edu wrote: Hi, I'm using Spark 1.1.0

Re: Question about VD and ED

2014-07-01 Thread Baoxu Shi(Dash)
Hi Bin, VD and ED are ClassTags, you could treat them as placeholder, or template T in C (not 100% clear). You do not need convert graph[String, Double] to Graph[VD,ED]. Check ClassTag’s definition in Scala could help. Best, On Jul 1, 2014, at 4:49 AM, Bin WU bw...@connect.ust.hk wrote: Hi

Re: Alternative to checkpointing and materialization for truncating lineage in high iteration jobs

2014-06-28 Thread Baoxu Shi(Dash)
I’m facing the same situation. It would be great if someone could provide a code snippet as example. On Jun 28, 2014, at 12:36 PM, Nilesh Chakraborty nil...@nileshc.com wrote: Hello, In a thread about java.lang.StackOverflowError when calling count() [1] I saw Tathagata Das share an

Can not checkpoint Graph object's vertices but could checkpoint edges

2014-06-20 Thread dash
I'm trying to workaround the StackOverflowError when an object have a long dependency chain, someone said I should use checkpoint to cuts off dependencies. I write a sample code to test it, but I can only checkpoint edges but not vertices. I think I do materialize vertices and edges after calling

Re: Best practices for removing lineage of a RDD or Graph object?

2014-06-18 Thread dash
println( iter +i+ finished) } } Baoxu Shi(Dash) Computer Science and Engineering Department University of Notre Dame b...@nd.edu On Jun 19, 2014, at 1:47 AM, roy20021 [via Apache Spark User List] ml-node+s1001560n7892...@n3.nabble.com wrote: No sure if it can help, btw: Checkpoint

Best practices for removing lineage of a RDD or Graph object?

2014-06-17 Thread dash
If a RDD object have non-empty .dependencies, does that means it have lineage? How could I remove it? I'm doing iterative computing and each iteration depends on the result computed in previous iteration. After several iteration, it will throw StackOverflowError. At first I'm trying to use