Hemant Kumar created HDDS-10275:
-----------------------------------
Summary: Double buffer not flushing DB transactions
Key: HDDS-10275
URL: https://issues.apache.org/jira/browse/HDDS-10275
Project: Apache Ozone
Issue Type: Bug
Reporter: Hemant Kumar
While looking into snapshot diff failure because it could not load the snapshot
because checkpointing dir doesn’t exist. Snapshot creation succeeded but
checkpointing dir doesn’t exist because it happens inside double buffed flush.
Looked at logs and there was no double buffer flush logs during that time.
Snapshot creation request:
{code:java}
2023-11-27 00:40:23,345 INFO [OM StateMachine ApplyTransaction Thread -
0]-org.apache.hadoop.ozone.om.request.snapshot.OMSnapshotCreateRequest: Created
snapshot: 'snap-ay36z' with snapshotId: 'bf0c6141-4185-4361-b15f-c4aa71c5c6d8'
under path 'vol-2xd36/buck-id806'
{code}
Double Buffer flush logs:
{code:java}
...
2023-11-27 00:10:23,826 INFO
[OMDoubleBufferFlushThread]-org.apache.hadoop.hdds.utils.db.RDBCheckpointManager:
Created checkpoint in rocksDB at
/var/lib/hadoop-ozone/om/data035525/db.snapshots/checkpointState/om.db-b2e9acb3-fee2-4190-8272-0649edca8d93
in 30 milliseconds
2023-11-27 00:10:23,827 INFO
[OMDoubleBufferFlushThread]-org.apache.hadoop.hdds.utils.db.RDBCheckpointUtils:
Waited for 1 milliseconds for checkpoint directory
/var/lib/hadoop-ozone/om/data035525/db.snapshots/checkpointState/om.db-b2e9acb3-fee2-4190-8272-0649edca8d93
availability.
2023-11-27 00:10:23,828 INFO
[OMDoubleBufferFlushThread]-org.apache.hadoop.ozone.om.OmSnapshotManager:
Created checkpoint :
/var/lib/hadoop-ozone/om/data035525/db.snapshots/checkpointState/om.db-b2e9acb3-fee2-4190-8272-0649edca8d93
for snapshot snap-mswq9
2023-11-27 00:10:39,586 INFO
[OMDoubleBufferFlushThread]-org.apache.hadoop.hdds.utils.db.RDBCheckpointManager:
Created checkpoint in rocksDB at
/var/lib/hadoop-ozone/om/data035525/db.snapshots/checkpointState/om.db-3369ac3a-61e1-4eca-b3cf-eb2de0b2d688
in 30 milliseconds
2023-11-27 00:10:39,586 INFO
[OMDoubleBufferFlushThread]-org.apache.hadoop.hdds.utils.db.RDBCheckpointUtils:
Waited for 0 milliseconds for checkpoint directory
/var/lib/hadoop-ozone/om/data035525/db.snapshots/checkpointState/om.db-3369ac3a-61e1-4eca-b3cf-eb2de0b2d688
availability.
2023-11-27 00:10:39,587 INFO
[OMDoubleBufferFlushThread]-org.apache.hadoop.ozone.om.OmSnapshotManager:
Created checkpoint :
/var/lib/hadoop-ozone/om/data035525/db.snapshots/checkpointState/om.db-3369ac3a-61e1-4eca-b3cf-eb2de0b2d688
for snapshot snap-f5u3t
2023-11-27 00:10:55,949 INFO
[OMDoubleBufferFlushThread]-org.apache.hadoop.hdds.utils.db.RDBCheckpointManager:
Created checkpoint in rocksDB at
/var/lib/hadoop-ozone/om/data035525/db.snapshots/checkpointState/om.db-3a690c8f-f3ef-415d-b25c-3aaf763c9507
in 22 milliseconds
2023-11-27 00:10:55,950 INFO
[OMDoubleBufferFlushThread]-org.apache.hadoop.hdds.utils.db.RDBCheckpointUtils:
Waited for 1 milliseconds for checkpoint directory
/var/lib/hadoop-ozone/om/data035525/db.snapshots/checkpointState/om.db-3a690c8f-f3ef-415d-b25c-3aaf763c9507
availability.
2023-11-27 00:10:55,950 INFO
[OMDoubleBufferFlushThread]-org.apache.hadoop.ozone.om.OmSnapshotManager:
Created checkpoint :
/var/lib/hadoop-ozone/om/data035525/db.snapshots/checkpointState/om.db-3a690c8f-f3ef-415d-b25c-3aaf763c9507
for snapshot snap-jfktn
2023-11-29 08:52:24,698 INFO
[OMDoubleBufferFlushThread]-org.apache.hadoop.hdds.utils.db.RDBCheckpointManager:
Created checkpoint in rocksDB at
/var/lib/hadoop-ozone/om/data035525/db.snapshots/checkpointState/om.db-c3ba17ef-d947-454e-9c4f-b9063ae65650
in 15 milliseconds
2023-11-29 08:52:24,715 INFO
[OMDoubleBufferFlushThread]-org.apache.hadoop.hdds.utils.db.RDBCheckpointUtils:
Waited for 16 milliseconds for checkpoint directory
/var/lib/hadoop-ozone/om/data035525/db.snapshots/checkpointState/om.db-c3ba17ef-d947-454e-9c4f-b9063ae65650
availability.
2023-11-29 08:52:24,717 WARN
[OMDoubleBufferFlushThread]-org.apache.hadoop.ozone.om.OmSnapshotManager: Took
614733 ns to find endKey. Caller is deleteKeysFromDelKeyTableInSnapshotScope
2023-11-29 08:52:24,718 INFO
[OMDoubleBufferFlushThread]-org.apache.hadoop.ozone.om.OmSnapshotManager:
Created checkpoint :
/var/lib/hadoop-ozone/om/data035525/db.snapshots/checkpointState/om.db-c3ba17ef-d947-454e-9c4f-b9063ae65650
for snapshot snap-ay36z
2023-11-29 08:52:24,745 INFO
[OMDoubleBufferFlushThread]-org.apache.hadoop.hdds.utils.db.RDBCheckpointManager:
Created checkpoint in rocksDB at
/var/lib/hadoop-ozone/om/data035525/db.snapshots/checkpointState/om.db-bf0c6141-4185-4361-b15f-c4aa71c5c6d8
in 12 milliseconds
2023-11-29 08:52:24,746 INFO
[OMDoubleBufferFlushThread]-org.apache.hadoop.hdds.utils.db.RDBCheckpointUtils:
Waited for 0 milliseconds for checkpoint directory
/var/lib/hadoop-ozone/om/data035525/db.snapshots/checkpointState/om.db-bf0c6141-4185-4361-b15f-c4aa71c5c6d8
availability.
2023-11-29 08:52:24,747 INFO
[OMDoubleBufferFlushThread]-org.apache.hadoop.ozone.om.OmSnapshotManager:
Created checkpoint :
/var/lib/hadoop-ozone/om/data035525/db.snapshots/checkpointState/om.db-bf0c6141-4185-4361-b15f-c4aa71c5c6d8
for snapshot snap-ay36z
...
{code}
Also looked if double buffer thread was terminated or paused but no log exists
for that as well. I looked at the logs for the whole hour between last double
buffer flush and check-pointing was not created. Couldn’t find any issue in
that as well.
On follower nodes, double buffer were working properly.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]