[
https://issues.apache.org/jira/browse/SPARK-51717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated SPARK-51717:
-----------------------------------
Labels: pull-request-available (was: )
> Possible SST mismatch error for the second snapshot created for a new query
> ---------------------------------------------------------------------------
>
> Key: SPARK-51717
> URL: https://issues.apache.org/jira/browse/SPARK-51717
> Project: Spark
> Issue Type: Bug
> Components: Structured Streaming
> Affects Versions: 4.0.0, 4.1.0
> Reporter: B. Micheal Okutubo
> Priority: Major
> Labels: pull-request-available
>
> Fix this error: Sst file size mismatch ... MANIFEST-000005 may be corrupted
> An edge case in SST file reuse that can only happen for the first ever
> RocksDB checkpoint if:
> # The first ever RocksDB checkpoint (e.g. for version 10) was created with
> x.sst, but not yet upload by maintenance
> # The next batch using RocksDB at v10 fails and rolls back store to -1
> (invalidates RocksDB)
> # A new request to load RocksDB at v10 comes in, but v10 checkpoint is still
> not uploaded hence we have to start replaying changelog starting from
> checkpoint v0.
> # We create a new v11 and new checkpoint with new x*.sst. v10 is now
> uploaded by maintenance. Then during upload of x*.sst for v11, we reuse x.sst
> DFS file, thinking it is the same as x*.sst.
> The problem here is from step 3, the way the file manager loads v0 is
> different from how it loads other versions. During the load of other
> versions, when we delete an existing local file we also delete it from file
> mapping. But for v0, file manager just deletes the local dir and we missed
> clearing the file mapping in this case. Hence the old x.sst was still showing
> in the file mapping at step 4. We need to fix this and also add additional
> size check.
>
> Only when using changelog checkpointing
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]