[ 
https://issues.apache.org/jira/browse/SPARK-55058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jerry Zheng updated SPARK-55058:
--------------------------------
    Description: The {{metadata}} file holds the streaming query ID, and should 
be existent if the commit and offset files are non-empty. This file not 
existing will result in duplicates and incorrectness downstream if using 
exactly-once sinks like DeltaSink which uses the streaming query ID to dedup 
commits for the same batch. If the metadata file isn’t there, but the commit 
and offset files are there, we should throw an error as the checkpoint is in an 
inconsistent state.  (was: The {{metadata}} file holds the streaming query ID, 
and should be existent if the commit and offset files are non-empty. This file 
not existing will result in duplicates and incorrectness downstream if using 
DeltaSink which uses the streaming query ID to dedup commits for the same 
batch. If the metadata file isn’t there, but the commit and offset files are 
there, we should throw an error as the checkpoint is in an inconsistent state.)

> Throw an error if the /metadata file is not present, but offset or commit 
> directories are non-empty
> ---------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-55058
>                 URL: https://issues.apache.org/jira/browse/SPARK-55058
>             Project: Spark
>          Issue Type: Task
>          Components: Structured Streaming
>    Affects Versions: 4.2.0
>            Reporter: Jerry Zheng
>            Priority: Major
>
> The {{metadata}} file holds the streaming query ID, and should be existent if 
> the commit and offset files are non-empty. This file not existing will result 
> in duplicates and incorrectness downstream if using exactly-once sinks like 
> DeltaSink which uses the streaming query ID to dedup commits for the same 
> batch. If the metadata file isn’t there, but the commit and offset files are 
> there, we should throw an error as the checkpoint is in an inconsistent state.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to