[ 
https://issues.apache.org/jira/browse/SPARK-57658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-57658.
----------------------------------
    Fix Version/s: 4.3.0
       Resolution: Fixed

Issue resolved by pull request 56718
[https://github.com/apache/spark/pull/56718]

> Report the available snapshot versions when a RocksDB state snapshot load 
> fails
> -------------------------------------------------------------------------------
>
>                 Key: SPARK-57658
>                 URL: https://issues.apache.org/jira/browse/SPARK-57658
>             Project: Spark
>          Issue Type: Improvement
>          Components: Structured Streaming
>    Affects Versions: 5.0.0
>            Reporter: Hyukjin Kwon
>            Assignee: Hyukjin Kwon
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 4.3.0
>
>
> When loading state from a snapshot (e.g. reading with the 
> {{snapshotStartBatchId}} option) the snapshot zip for the requested version 
> can be missing, most commonly because the asynchronous maintenance thread has 
> not uploaded it yet. Today this surfaces only as:
> {code}
> [CANNOT_LOAD_STATE_STORE.UNCATEGORIZED] ...
> Caused by: java.io.FileNotFoundException: .../state/0/1/2.zip does not exist
> {code}
> which gives no indication of whether the snapshot was never created or merely 
> not uploaded in time. This has caused hard-to-diagnose intermittent failures 
> in scheduled CI (the {{snapshotStartBatchId ... transformWithState}} tests, 
> across master/branch-4.x/branch-4.2 and both plain and row-checksum variants).
> This enriches the {{FileNotFoundException}} thrown from 
> {{RocksDBFileManager}} with the snapshot (.zip) and changelog (.changelog) 
> files that ARE present in the DFS checkpoint root, so any future occurrence 
> is self-diagnosing from the logs (e.g. 'asked for 2.zip, only 1.zip present' 
> clearly indicates the async upload race). The listing is best-effort and 
> never throws. A unit test in RocksDBSuite validates the enriched message.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to