[
https://issues.apache.org/jira/browse/HDDS-7279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17611793#comment-17611793
]
George Jahad commented on HDDS-7279:
------------------------------------
Normal tests show the problem very infrequently. You can make it more
reproducible by preventing the flush thread from waking up when a key commit
request occurs.
That makes the thread stay dormant until the "create snapshot" request came in,
which would leave a bunch of key commits in the same batch with the create
snapshot request. Something like this:
https://github.com/GeorgeJahad/ozone/compare/05ed641c6...6cdfd890f#diff-15eb45eac3d9d3b7f34caf80bb6eb63be58f004e936626a4c3771ad0fc5f7f4e
Then run the TestOmSnapshotFileSystem class. It should fail most every time.
> Snapshot Create requires Double Buffer Flush thread to split the commit batch
> -----------------------------------------------------------------------------
>
> Key: HDDS-7279
> URL: https://issues.apache.org/jira/browse/HDDS-7279
> Project: Apache Ozone
> Issue Type: Sub-task
> Reporter: George Jahad
> Priority: Major
>
> The OmRequest double buffer flush thread flushes the entire buffer as a
> batch. Since follower OM's will flush batches with different contents,
> snapshots can't stay consistent between the leader and the followers.
> This means the flush thread needs to be "snapshot aware" and split the batch
> so that all operations before the snapshot create are commited to rocksdb
> before the checkpoint is created.
> Details here:
> https://docs.google.com/document/d/18BRPMol3EX5FioRaHliksx5uIGTw5iYTFc83PAKFLQU/edit#
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]