Hemant Kumar created HDDS-10275:
-----------------------------------

             Summary: Double buffer not flushing DB transactions
                 Key: HDDS-10275
                 URL: https://issues.apache.org/jira/browse/HDDS-10275
             Project: Apache Ozone
          Issue Type: Bug
            Reporter: Hemant Kumar


While looking into snapshot diff failure because it could not load the snapshot 
because checkpointing dir doesn’t exist. Snapshot creation succeeded but 
checkpointing dir doesn’t exist because it happens inside double buffed flush.

Looked at logs and there was no double buffer flush logs during that time.

Snapshot creation request:
{code:java}
2023-11-27 00:40:23,345 INFO [OM StateMachine ApplyTransaction Thread - 
0]-org.apache.hadoop.ozone.om.request.snapshot.OMSnapshotCreateRequest: Created 
snapshot: 'snap-ay36z' with snapshotId: 'bf0c6141-4185-4361-b15f-c4aa71c5c6d8' 
under path 'vol-2xd36/buck-id806'
{code}
Double Buffer flush logs:
{code:java}
...
2023-11-27 00:10:23,826 INFO 
[OMDoubleBufferFlushThread]-org.apache.hadoop.hdds.utils.db.RDBCheckpointManager:
 Created checkpoint in rocksDB at 
/var/lib/hadoop-ozone/om/data035525/db.snapshots/checkpointState/om.db-b2e9acb3-fee2-4190-8272-0649edca8d93
 in 30 milliseconds
2023-11-27 00:10:23,827 INFO 
[OMDoubleBufferFlushThread]-org.apache.hadoop.hdds.utils.db.RDBCheckpointUtils: 
Waited for 1 milliseconds for checkpoint directory 
/var/lib/hadoop-ozone/om/data035525/db.snapshots/checkpointState/om.db-b2e9acb3-fee2-4190-8272-0649edca8d93
 availability.
2023-11-27 00:10:23,828 INFO 
[OMDoubleBufferFlushThread]-org.apache.hadoop.ozone.om.OmSnapshotManager: 
Created checkpoint : 
/var/lib/hadoop-ozone/om/data035525/db.snapshots/checkpointState/om.db-b2e9acb3-fee2-4190-8272-0649edca8d93
 for snapshot snap-mswq9
2023-11-27 00:10:39,586 INFO 
[OMDoubleBufferFlushThread]-org.apache.hadoop.hdds.utils.db.RDBCheckpointManager:
 Created checkpoint in rocksDB at 
/var/lib/hadoop-ozone/om/data035525/db.snapshots/checkpointState/om.db-3369ac3a-61e1-4eca-b3cf-eb2de0b2d688
 in 30 milliseconds
2023-11-27 00:10:39,586 INFO 
[OMDoubleBufferFlushThread]-org.apache.hadoop.hdds.utils.db.RDBCheckpointUtils: 
Waited for 0 milliseconds for checkpoint directory 
/var/lib/hadoop-ozone/om/data035525/db.snapshots/checkpointState/om.db-3369ac3a-61e1-4eca-b3cf-eb2de0b2d688
 availability.
2023-11-27 00:10:39,587 INFO 
[OMDoubleBufferFlushThread]-org.apache.hadoop.ozone.om.OmSnapshotManager: 
Created checkpoint : 
/var/lib/hadoop-ozone/om/data035525/db.snapshots/checkpointState/om.db-3369ac3a-61e1-4eca-b3cf-eb2de0b2d688
 for snapshot snap-f5u3t
2023-11-27 00:10:55,949 INFO 
[OMDoubleBufferFlushThread]-org.apache.hadoop.hdds.utils.db.RDBCheckpointManager:
 Created checkpoint in rocksDB at 
/var/lib/hadoop-ozone/om/data035525/db.snapshots/checkpointState/om.db-3a690c8f-f3ef-415d-b25c-3aaf763c9507
 in 22 milliseconds
2023-11-27 00:10:55,950 INFO 
[OMDoubleBufferFlushThread]-org.apache.hadoop.hdds.utils.db.RDBCheckpointUtils: 
Waited for 1 milliseconds for checkpoint directory 
/var/lib/hadoop-ozone/om/data035525/db.snapshots/checkpointState/om.db-3a690c8f-f3ef-415d-b25c-3aaf763c9507
 availability.
2023-11-27 00:10:55,950 INFO 
[OMDoubleBufferFlushThread]-org.apache.hadoop.ozone.om.OmSnapshotManager: 
Created checkpoint : 
/var/lib/hadoop-ozone/om/data035525/db.snapshots/checkpointState/om.db-3a690c8f-f3ef-415d-b25c-3aaf763c9507
 for snapshot snap-jfktn
2023-11-29 08:52:24,698 INFO 
[OMDoubleBufferFlushThread]-org.apache.hadoop.hdds.utils.db.RDBCheckpointManager:
 Created checkpoint in rocksDB at 
/var/lib/hadoop-ozone/om/data035525/db.snapshots/checkpointState/om.db-c3ba17ef-d947-454e-9c4f-b9063ae65650
 in 15 milliseconds
2023-11-29 08:52:24,715 INFO 
[OMDoubleBufferFlushThread]-org.apache.hadoop.hdds.utils.db.RDBCheckpointUtils: 
Waited for 16 milliseconds for checkpoint directory 
/var/lib/hadoop-ozone/om/data035525/db.snapshots/checkpointState/om.db-c3ba17ef-d947-454e-9c4f-b9063ae65650
 availability.
2023-11-29 08:52:24,717 WARN 
[OMDoubleBufferFlushThread]-org.apache.hadoop.ozone.om.OmSnapshotManager: Took 
614733 ns to find endKey. Caller is deleteKeysFromDelKeyTableInSnapshotScope
2023-11-29 08:52:24,718 INFO 
[OMDoubleBufferFlushThread]-org.apache.hadoop.ozone.om.OmSnapshotManager: 
Created checkpoint : 
/var/lib/hadoop-ozone/om/data035525/db.snapshots/checkpointState/om.db-c3ba17ef-d947-454e-9c4f-b9063ae65650
 for snapshot snap-ay36z
2023-11-29 08:52:24,745 INFO 
[OMDoubleBufferFlushThread]-org.apache.hadoop.hdds.utils.db.RDBCheckpointManager:
 Created checkpoint in rocksDB at 
/var/lib/hadoop-ozone/om/data035525/db.snapshots/checkpointState/om.db-bf0c6141-4185-4361-b15f-c4aa71c5c6d8
 in 12 milliseconds
2023-11-29 08:52:24,746 INFO 
[OMDoubleBufferFlushThread]-org.apache.hadoop.hdds.utils.db.RDBCheckpointUtils: 
Waited for 0 milliseconds for checkpoint directory 
/var/lib/hadoop-ozone/om/data035525/db.snapshots/checkpointState/om.db-bf0c6141-4185-4361-b15f-c4aa71c5c6d8
 availability.
2023-11-29 08:52:24,747 INFO 
[OMDoubleBufferFlushThread]-org.apache.hadoop.ozone.om.OmSnapshotManager: 
Created checkpoint : 
/var/lib/hadoop-ozone/om/data035525/db.snapshots/checkpointState/om.db-bf0c6141-4185-4361-b15f-c4aa71c5c6d8
 for snapshot snap-ay36z
...
{code}
Also looked if double buffer thread was terminated or paused but no log exists 
for that as well. I looked at the logs for the whole hour between last double 
buffer flush and check-pointing was not created. Couldn’t find any issue in 
that as well.

On follower nodes, double buffer were working properly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to