Re: How to increase the parallelism of Spark Streaming application?

2018-11-08 Thread JF Chen
Yes, now I have allocated 100 cores and 8 kafka partitions, and then repartition it to 100 to feed 100 cores. In following stage I have map action, will it also cause slow down? Regard, Junfeng Chen On Thu, Nov 8, 2018 at 12:34 AM Shahbaz wrote: > Hi , > >- Do you have adequate CPU cores

Re: How to increase the parallelism of Spark Streaming application?

2018-11-08 Thread JF Chen
Hi, I have test it on my production environment, and I find a strange thing. After I set the kafka partition to 100, some tasks are executed very fast, but some are slow. The slow ones cost double time than fast ones(from event timeline). However, I have checked the consumer offsets, the data

Re: How to increase the parallelism of Spark Streaming application?

2018-11-08 Thread JF Chen
Memory is not a big problem for me... SO no any other bad effect? Regard, Junfeng Chen On Wed, Nov 7, 2018 at 4:51 PM Michael Shtelma wrote: > If you configure to many Kafka partitions, you can run into memory issues. > This will increase memory requirements for spark job a lot. > > Best, >

Re: How to increase the parallelism of Spark Streaming application?

2018-11-07 Thread Shahbaz
Hi , - Do you have adequate CPU cores allocated to handle increased partitions ,generally if you have Kafka partitions >=(greater than or equal to) CPU Cores Total (Number of Executor Instances * Per Executor Core) ,gives increased task parallelism for reader phase. - However if

Re: How to increase the parallelism of Spark Streaming application?

2018-11-07 Thread vincent gromakowski
On the other side increasing parallelism with kakfa partition avoid the shuffle in spark to repartition Le mer. 7 nov. 2018 à 09:51, Michael Shtelma a écrit : > If you configure to many Kafka partitions, you can run into memory issues. > This will increase memory requirements for spark job a

Re: How to increase the parallelism of Spark Streaming application?

2018-11-07 Thread Michael Shtelma
If you configure to many Kafka partitions, you can run into memory issues. This will increase memory requirements for spark job a lot. Best, Michael On Wed, Nov 7, 2018 at 8:28 AM JF Chen wrote: > I have a Spark Streaming application which reads data from kafka and save > the the

How to increase the parallelism of Spark Streaming application?

2018-11-06 Thread JF Chen
I have a Spark Streaming application which reads data from kafka and save the the transformation result to hdfs. My original partition number of kafka topic is 8, and repartition the data to 100 to increase the parallelism of spark job. Now I am wondering if I increase the kafka partition number