[ 
https://issues.apache.org/jira/browse/SPARK-3129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tathagata Das resolved SPARK-3129.
----------------------------------
       Resolution: Fixed
    Fix Version/s: 1.2.0

I am marking this as fixed, as all non-test related issues have been merged. 
The one sub-task left is related to unit-tests that uses the WAL to do 
end-to-end tests and verify no data loss.

> Prevent data loss in Spark Streaming on driver failure
> ------------------------------------------------------
>
>                 Key: SPARK-3129
>                 URL: https://issues.apache.org/jira/browse/SPARK-3129
>             Project: Spark
>          Issue Type: New Feature
>          Components: Streaming
>    Affects Versions: 1.0.0, 1.0.1, 1.0.2, 1.1.0, 1.0.3
>            Reporter: Hari Shreedharan
>            Assignee: Tathagata Das
>            Priority: Critical
>             Fix For: 1.2.0
>
>         Attachments: SecurityFix.diff
>
>
> Spark Streaming can small amounts of data when the driver goes down - and the 
> sending system cannot re-send the data (or the data has already expired on 
> the sender side). This currently affects all receivers. 
> The solution we propose is to reliably store all the received data into HDFS. 
> This will allow the data to persist through driver failures, and therefore 
> can be processed when the driver gets restarted. 
> The high level design doc for this feature is given here. 
> https://docs.google.com/document/d/1vTCB5qVfyxQPlHuv8rit9-zjdttlgaSrMgfCDQlCJIM/edit?usp=sharing
> This major task has been divided in sub-tasks
> - Implementing a write ahead log management system that can manage rolling 
> write ahead logs - write to log, recover on failure and clean up old logs
> - Implementing a HDFS backed block RDD that can read data either from Spark's 
> BlockManager or from HDFS files
> - Implementing a ReceivedBlockHandler interface that abstracts out the 
> functionality of saving received blocks
> - Implementing a ReceivedBlockTracker and other associated changes in the 
> driver that allows metadata of received blocks and block-to-batch allocations 
> to be recovered upon driver retart



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to