[
https://issues.apache.org/jira/browse/SPARK-3129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tathagata Das updated SPARK-3129:
---------------------------------
Summary: Prevent data loss in Spark Streaming on driver failure using Write
Ahead Logs (was: Prevent data loss in Spark Streaming on driver failure)
> Prevent data loss in Spark Streaming on driver failure using Write Ahead Logs
> -----------------------------------------------------------------------------
>
> Key: SPARK-3129
> URL: https://issues.apache.org/jira/browse/SPARK-3129
> Project: Spark
> Issue Type: New Feature
> Components: Streaming
> Affects Versions: 1.0.0, 1.0.1, 1.0.2, 1.1.0, 1.0.3
> Reporter: Hari Shreedharan
> Assignee: Tathagata Das
> Priority: Critical
> Fix For: 1.2.0
>
> Attachments: SecurityFix.diff
>
>
> Spark Streaming can small amounts of data when the driver goes down - and the
> sending system cannot re-send the data (or the data has already expired on
> the sender side). This currently affects all receivers.
> The solution we propose is to reliably store all the received data into HDFS.
> This will allow the data to persist through driver failures, and therefore
> can be processed when the driver gets restarted.
> The high level design doc for this feature is given here.
> https://docs.google.com/document/d/1vTCB5qVfyxQPlHuv8rit9-zjdttlgaSrMgfCDQlCJIM/edit?usp=sharing
> This major task has been divided in sub-tasks
> - Implementing a write ahead log management system that can manage rolling
> write ahead logs - write to log, recover on failure and clean up old logs
> - Implementing a HDFS backed block RDD that can read data either from Spark's
> BlockManager or from HDFS files
> - Implementing a ReceivedBlockHandler interface that abstracts out the
> functionality of saving received blocks
> - Implementing a ReceivedBlockTracker and other associated changes in the
> driver that allows metadata of received blocks and block-to-batch allocations
> to be recovered upon driver retart
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]