Wei-Chiu Chuang created HDDS-12982:
--------------------------------------

             Summary: [Snapshot] Tone down error "Snapshot validation failed"
                 Key: HDDS-12982
                 URL: https://issues.apache.org/jira/browse/HDDS-12982
             Project: Apache Ozone
          Issue Type: Sub-task
          Components: Snapshot
    Affects Versions: 2.0.0
            Reporter: Wei-Chiu Chuang


We sometimes observe this error 
{noformat}
2024-12-02 18:47:09,542 ERROR [OM StateMachine ApplyTransaction Thread - 
0]-org.apache.hadoop.ozone.om.request.key.OMKeyPurgeRequest: Error occurred 
while performing OmKeyPurge.
INVALID_REQUEST org.apache.hadoop.ozone.om.exceptions.OMException: Snapshot 
validation failed. Expected previous snapshotId : null but was 
e639ca0c-73a1-4a60-8da5-c21ed9634210
        at 
org.apache.hadoop.ozone.om.snapshot.SnapshotUtils.validatePreviousSnapshotId(SnapshotUtils.java:303)
        at 
org.apache.hadoop.ozone.om.request.key.OMKeyPurgeRequest.validateAndUpdateCache(OMKeyPurgeRequest.java:83)
        at 
org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.handleWriteRequest(OzoneManagerRequestHandler.java:378)
        at 
org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.runCommand(OzoneManagerStateMachine.java:560)
        at 
org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.lambda$1(OzoneManagerStateMachine.java:353)
        at 
java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:834)
{noformat}

According to the analysis made by [~swamirishi] and [~sshenoy] this exception 
is acceptable. 

bq. This is kind of a race condition with snapshotCreate and keyPurge happening 
on the background service. Purge operation takes a lesser priority over 
snapshot create. So this kind of error can be ignored.

I am aware we'll be implementing locking for the snapshot operations so race 
condition shouldn't happen after that. I'd suggest to reduce the log level from 
ERROR to WARN. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to