Tathagata Das created SPARK-7139:
------------------------------------
Summary: Allow received block metadata to be saved to WAL and
recovered on driver failure
Key: SPARK-7139
URL: https://issues.apache.org/jira/browse/SPARK-7139
Project: Spark
Issue Type: Improvement
Components: Streaming
Reporter: Tathagata Das
Assignee: Tathagata Das
Priority: Critical
The received API allows arbitrary metadata to be added for each block. However
that information is not saved in the WAL as part of the block information in
the driver.
To fix this, the following needs to be done.
1. Forward the metadata to the ReceivedBlockTracker in the driver.
2. ReceivedBlockTracker saves the metadata and recovers it on restart.
However there is one tricky thing. The ReceivedBlockTracker WAL is enabled only
when `spark.streaming.receiver.writeAheadLog.enable = true`. This means that
only when receiver WAL is enabled is the driver WAL enabled. This is not
desired as the one may want to save and recovered block metadata information
(especially information like Kafka offsets or Kinesis sequence numbers) that
can be used to recover data without actually saving the data to the receiver
WAL. So we have to always enable the tracker WAL.
3. Always enable the ReceivedBlockTracker WAL. However, make sure that the
blockIds are not recovered as they will not be useful after driver restart (the
blocks are gone!).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]