[
https://issues.apache.org/jira/browse/SPARK-7139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tathagata Das resolved SPARK-7139.
----------------------------------
Resolution: Fixed
Fix Version/s: 1.4.0
> Allow received block metadata to be saved to WAL and recovered on driver
> failure
> --------------------------------------------------------------------------------
>
> Key: SPARK-7139
> URL: https://issues.apache.org/jira/browse/SPARK-7139
> Project: Spark
> Issue Type: Improvement
> Components: Streaming
> Reporter: Tathagata Das
> Assignee: Tathagata Das
> Priority: Blocker
> Fix For: 1.4.0
>
>
> The received API allows arbitrary metadata to be added for each block.
> However that information is not saved in the WAL as part of the block
> information in the driver.
> To fix this, the following needs to be done.
> 1. Forward the metadata to the ReceivedBlockTracker in the driver.
> 2. ReceivedBlockTracker saves the metadata and recovers it on restart.
> However there is one tricky thing. The ReceivedBlockTracker WAL is enabled
> only when `spark.streaming.receiver.writeAheadLog.enable = true`. This means
> that only when receiver WAL is enabled is the driver WAL enabled. This is
> not desired as the one may want to save and recovered block metadata
> information (especially information like Kafka offsets or Kinesis sequence
> numbers) that can be used to recover data without actually saving the data to
> the receiver WAL. So we have to always enable the tracker WAL.
> 3. Always enable the ReceivedBlockTracker WAL. However, make sure that the
> WriteAheadLogBackedBlockRDD skips block lookup after restart as the blocks
> are obviously gone.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]