Hi Mark,

For input streams like text input stream, only RDDs can be recovered from 
checkpoint, no missed files, if file is missed, actually an exception will be 
raised. If you use HDFS, HDFS will guarantee no data loss since it has 3 
copies.Otherwise user logic has to guarantee no file deleted before recovering.

For input stream which is receiver based, like Kafka input stream or socket 
input stream, a WAL(write ahead log) mechanism can be enabled to store the 
received data as well as metadata, so data can be recovered from failure.

Thanks
Jerry

-----Original Message-----
From: mkhaitman [mailto:[email protected]] 
Sent: Monday, February 23, 2015 10:54 AM
To: [email protected]
Subject: StreamingContext textFileStream question

Hello,

I was interested in creating a StreamingContext textFileStream based job, which 
runs for long durations, and can also recover from prolonged driver failure... 
It seems like StreamingContext checkpointing is mainly used for the case when 
the driver dies during the processing of an RDD, and to recover that one RDD, 
but my question specifically relates to whether there is a way to also recover 
which files were missed between the timeframe of the driver dying and being 
started back up (whether manually or automatically).

Any assistance/suggestions with this one would be greatly appreciated!

Thanks,
Mark.



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/StreamingContext-textFileStream-question-tp10742.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected] For additional 
commands, e-mail: [email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to