[
https://issues.apache.org/jira/browse/SPARK-26425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dongjoon Hyun updated SPARK-26425:
----------------------------------
Affects Version/s: (was: 2.4.0)
3.0.0
> Add more constraint checks in file streaming source to avoid checkpoint
> corruption
> ----------------------------------------------------------------------------------
>
> Key: SPARK-26425
> URL: https://issues.apache.org/jira/browse/SPARK-26425
> Project: Spark
> Issue Type: Improvement
> Components: Structured Streaming
> Affects Versions: 3.0.0
> Reporter: Tathagata Das
> Assignee: Tathagata Das
> Priority: Major
>
> Two issues observed in production.
> - HDFSMetadataLog.getLatest() tries to read older versions when it is not
> able to read the latest listed version file. Not sure why this was done but
> this should not be done. If the latest listed file is not readable, then
> something is horribly wrong and we should fail rather than report an older
> version as that can completely corrupt the checkpoint directory.
> - FileStreamSource should check whether adding the a new batch to the
> FileStreamSourceLog succeeded or not (similar to how StreamExecution checks
> for the OffsetSeqLog)
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]