Re: Parallelize on spark context

2014-11-07 Thread _soumya_
-spark-user-list.1001560.n3.nabble.com/Parallelize-on-spark-context-tp18327p18381.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org

Parallelize on spark context

2014-11-06 Thread Naveen Kumar Pokala
Hi, JavaRDDInteger distData = sc.parallelize(data); On what basis parallelize splits the data into multiple datasets. How to handle if we want these many datasets to be executed per executor? For example, my data is of 1000 integers list and I am having 2 node yarn cluster. It is diving into

RE: Parallelize on spark context

2014-11-06 Thread Naveen Kumar Pokala
@spark.apache.org Subject: Re: Parallelize on spark context Hi Naveen, So by default when we call parallelize it will be parallelized by the default number (which we can control with the property spark.default.parallelism) or if we just want a specific instance of parallelize to have a different