Re: The Processing loading of Spark streaming on YARN is not in balance

2015-04-30 Thread Saisai Shao
From the chart you pasted, I guess you only have one receiver with storage level two copies, so mostly your taks are located on two executors. You could use repartition to redistribute the data more evenly across the executors. Also add more receiver is another solution. 2015-04-30 14:38

Re: The Processing loading of Spark streaming on YARN is not in balance

2015-04-30 Thread Lin Hao Xu
It seems that the data size is only 2.9MB, far less than the default rdd size. How about put more data into kafka? and what about the number of topic partitions from kafka? Best regards, Lin Hao XU IBM Research China Email: xulin...@cn.ibm.com My Flickr:

The Processing loading of Spark streaming on YARN is not in balance

2015-04-30 Thread Kyle Lin
Hi all My environment info Hadoop release version: HDP 2.1 Kakfa: 0.8.1.2.1.4.0 Spark: 1.1.0 My question: I ran Spark streaming program on YARN. My Spark streaming program will read data from Kafka and doing some processing. But, I found there is always only ONE executor under processing. As

Re: The Processing loading of Spark streaming on YARN is not in balance

2015-04-30 Thread Kyle Lin
Hello Lin Hao Thanks for your reply. I will try to produce more data into Kafka. I run three Kafka borkers. Following is my topic info. Topic:kyle_test_topic PartitionCount:3 ReplicationFactor:2 Configs: Topic: kyle_test_topic Partition: 0 Leader: 3 Replicas: 3,4 Isr: 3,4 Topic: kyle_test_topic

Re: The Processing loading of Spark streaming on YARN is not in balance

2015-04-30 Thread Kyle Lin
Hi all Producing more data into Kafka is not effective in my situation, because the speed of reading Kafka is consistent. I will adopt Saiai's suggestion to add more receivers. Kyle 2015-04-30 14:49 GMT+08:00 Saisai Shao sai.sai.s...@gmail.com: From the chart you pasted, I guess you only