Re: spark streaming kafka connector questions

2016-09-10 Thread Cheng Yi
After some investigation, the problem i see is liked caused by a filter and union of the dstream. if i just do kafka-stream -- process -- output operator, then there is no problem, one event will be fetched once. if i do kafka-stream -- process(1) - filter a stream A for later union --|

spark streaming kafka connector questions

2016-09-08 Thread Cheng Yi
I am using the lastest streaming kafka connector org.apache.spark spark-streaming-kafka_2.11 1.6.2 I am facing the problem that a message is delivered two times to my consumers. these two deliveries are 10+ seconds apart, it looks this is caused by my lengthy message processing (took about 60