[ 
https://issues.apache.org/jira/browse/HDDS-7279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17611793#comment-17611793
 ] 

George Jahad commented on HDDS-7279:
------------------------------------

Normal tests show the problem very infrequently.  You can make it more 
reproducible by preventing the flush thread from waking up when a key commit 
request occurs.

That makes the thread stay dormant until the "create snapshot" request came in, 
which would leave a bunch of key commits in the same batch with the create 
snapshot request.  Something like this:

https://github.com/GeorgeJahad/ozone/compare/05ed641c6...6cdfd890f#diff-15eb45eac3d9d3b7f34caf80bb6eb63be58f004e936626a4c3771ad0fc5f7f4e

Then run the TestOmSnapshotFileSystem class.  It should fail most every time.

 

> Snapshot Create requires Double Buffer Flush thread to split the commit batch
> -----------------------------------------------------------------------------
>
>                 Key: HDDS-7279
>                 URL: https://issues.apache.org/jira/browse/HDDS-7279
>             Project: Apache Ozone
>          Issue Type: Sub-task
>            Reporter: George Jahad
>            Priority: Major
>
> The OmRequest double buffer flush thread flushes the entire buffer as a 
> batch.  Since follower OM's will flush batches with different contents, 
> snapshots can't stay consistent between the leader and the followers.
> This means the flush thread needs to be "snapshot aware" and split the batch 
> so that all operations before the snapshot create are commited to rocksdb 
> before the checkpoint is created.
> Details here:
> https://docs.google.com/document/d/18BRPMol3EX5FioRaHliksx5uIGTw5iYTFc83PAKFLQU/edit#



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to