SaintBacchus created SPARK-8163:
-----------------------------------
Summary: CheckPoint mechanism did not work well when error
happened in big streaming
Key: SPARK-8163
URL: https://issues.apache.org/jira/browse/SPARK-8163
Project: Spark
Issue Type: Bug
Components: Streaming
Affects Versions: 1.4.0
Reporter: SaintBacchus
Fix For: 1.5.0
I tested it with Kafka DStream.
Sometimes Kafka Producer had push a lot data to the Kafka Brokers, then
Streaming Receiver wanted to pull this data without rate limite.
At this first batch, Streaming may take 10 or more seconds to comsume this
data(batch was 2 second).
I wanted to describle what the Streaming do more detail at this moment:
The SC was doing its job; the JobGenerator was still send new batchs to
StreamingContext and StreamingContext writed this to the CheckPoint files;And
the Receiver still was busy receiving the data from kafka and also tracked this
events into CheckPoint.
Then an error(unexcept error) occured, leading to shutdown the Streaming
Application.
Then we wanted to recover the application from check point files.But since the
StreamingContext had record the next few batch, it would be recorvered from the
last batch. So the Streaming had already missed the first batch and did not
know what data had been actually comsumed by Receiver.
Setting spark.streaming.concurrentJobs=2 could avoid this problem, but some
application can not do this.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]