[
https://issues.apache.org/jira/browse/SPARK-8163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
SaintBacchus reopened SPARK-8163:
---------------------------------
> CheckPoint mechanism did not work well when error happened in big streaming
> ---------------------------------------------------------------------------
>
> Key: SPARK-8163
> URL: https://issues.apache.org/jira/browse/SPARK-8163
> Project: Spark
> Issue Type: Bug
> Components: Streaming
> Affects Versions: 1.4.0
> Reporter: SaintBacchus
>
> I tested it with Kafka DStream.
> Sometimes Kafka Producer had push a lot data to the Kafka Brokers, then
> Streaming Receiver wanted to pull this data without rate limite.
> At this first batch, Streaming may take 10 or more seconds to comsume this
> data(batch was 2 second).
> I wanted to describle what the Streaming do more detail at this moment:
> The SC was doing its job; the JobGenerator was still send new batchs to
> StreamingContext and StreamingContext writed this to the CheckPoint files;And
> the Receiver still was busy receiving the data from kafka and also tracked
> this events into CheckPoint.
> Then an error(unexcept error) occured, leading to shutdown the Streaming
> Application.
> Then we wanted to recover the application from check point files.But since
> the StreamingContext had record the next few batch, it would be recorvered
> from the last batch. So the Streaming had already missed the first batch and
> did not know what data had been actually comsumed by Receiver.
> Setting spark.streaming.concurrentJobs=2 could avoid this problem, but some
> application can not do this.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]