subject:"Why different numbers of partitions give different results for the same computation on the same dataset\?"

Why different numbers of partitions give different results for the same computation on the same dataset?

2015-03-03 Thread Saiph Kappa

Hi, I have a spark streaming application, running on a single node, consisting mainly of map operations. I perform repartitioning to control the number of CPU cores that I want to use. The code goes like this: val ssc = new StreamingContext(sparkConf, Seconds(5)) val distFile =

Re: Why different numbers of partitions give different results for the same computation on the same dataset?

2015-03-03 Thread Tathagata Das

You can use DStream.transform() to do any arbitrary RDD transformations on the RDDs generated by a DStream. val coalescedDStream = myDStream.transform { _.coalesce(...) } On Tue, Mar 3, 2015 at 1:47 PM, Saiph Kappa saiph.ka...@gmail.com wrote: Sorry I made a mistake in my code. Please ignore

Re: Why different numbers of partitions give different results for the same computation on the same dataset?

2015-03-03 Thread Saiph Kappa

Sorry I made a mistake in my code. Please ignore my question number 2. Different numbers of partitions give *the same* results! On Tue, Mar 3, 2015 at 7:32 PM, Saiph Kappa saiph.ka...@gmail.com wrote: Hi, I have a spark streaming application, running on a single node, consisting mainly of