[
https://issues.apache.org/jira/browse/HDDS-8940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17755668#comment-17755668
]
Hemant Kumar edited comment on HDDS-8940 at 8/17/23 8:48 PM:
-------------------------------------------------------------
We can apply the above theory in following example:
1. Snapshot: *cm-23-1692026000702-1* was created with snapshotId:
*64858946-aa02-4069-b2ba-29708b38fbfe*
{code}
2023-08-14 15:22:27,648 INFO [OM StateMachine ApplyTransaction Thread -
0]-org.apache.hadoop.ozone.om.request.snapshot.OMSnapshotCreateRequest: Created
snapshot: 'cm-23-1692026000702-1' with snapshotId:
'64858946-aa02-4069-b2ba-29708b38fbfe' under path 's3v/bucket870ifflobs'
2023-08-14 15:22:27,683 INFO
[OMDoubleBufferFlushThread]-org.apache.hadoop.ozone.om.OmSnapshotManager:
Created checkpoint :
/var/lib/hadoop-ozone/om/data/db.snapshots/checkpointState/om.db-64858946-aa02-4069-b2ba-29708b38fbfe
for snapshot cm-23-1692026000702-1
{code}
2. *SSTFilteringService* removed 000471 form the checkpoint dir.
{code}
2023-08-14 15:23:22,749 INFO
[SstFilteringService#0]-org.apache.hadoop.hdds.utils.db.RocksDatabase: Deleting
sst file /000471.sst corresponding to column family fileTable from db:
/var/lib/hadoop-ozone/om/data/db.snapshots/checkpointState/om.db-64858946-aa02-4069-b2ba-29708b38fbfe
2023-08-14 15:23:22,751 INFO
[SstFilteringService#0]-org.apache.hadoop.hdds.utils.db.managed.ManagedRocksObjectUtils:
Waited for 0 milliseconds for file
/var/lib/hadoop-ozone/om/data/db.snapshots/checkpointState/om.db-64858946-aa02-4069-b2ba-29708b38fbfe/000471.sst
deletion.
{code}
3. 000471 was also removed by SST pruning service from SST backup dir.
{code}
2023-08-14 15:44:00,141 INFO
[CompactionDagPruningService]-org.apache.ozone.rocksdiff.RocksDBCheckpointDiffer:
Removing SST files: [000256, 000210, 000139, 000456, 000411, 000417, 000263,
000141, 000261, 000322, 000520, 000564, 000442, 000486, 000364, 000242, 000567,
000126, 000203, 000246, 000367, 000521, 000407, 000208, 000405, 000449, 000206,
000569, 000171, 000252, 000296, 000373, 000370, 000311, 000355, 000553, 000475,
000155, 000276, 000110, 000550, 000314, 000158, 000517, 000514, 000361, 000283,
000480, 000421, 000101, 000189, 000464, 000540, 000424, 000467, 000308, 000109,
000108, 000504, 000228, 000305, 000349, 000193, 000391, 000471, 000470, 000150]
as part of SST file pruning.
{code}
4. Snapshot: *cm-tmp-82f80694-7d3b-43d9-9430-393486b250c4* with snapshotId:
*0d31acca-64bf-4662-a36c-5ce239023cde*
{code}
2023-08-16 23:36:20,465 INFO [OM StateMachine ApplyTransaction Thread -
0]-org.apache.hadoop.ozone.om.request.snapshot.OMSnapshotCreateRequest: Created
snapshot: 'cm-tmp-82f80694-7d3b-43d9-9430-393486b250c4' with snapshotId:
'0d31acca-64bf-4662-a36c-5ce239023cde' under path 's3v/bucket870ifflobs'
{code}
5. Node: 000471 was added to the traversal as child node of the node: 000543
and added to the diff because it was taken before the node: 000543.
{code}
2023-08-16 23:36:26,948 DEBUG
[snapshot-diff-job-thread-id-1]-org.apache.ozone.rocksdiff.RocksDBCheckpointDiffer:
Processing node: 000543
2023-08-16 23:36:26,948 DEBUG
[snapshot-diff-job-thread-id-1]-org.apache.ozone.rocksdiff.RocksDBCheckpointDiffer:
No further compaction happened to the current file. Src
'/var/lib/hadoop-ozone/om/data/db.snapshots/checkpointState/om.db-0d31acca-64bf-4662-a36c-5ce239023cde'
and dest
'/var/lib/hadoop-ozone/om/data/db.snapshots/checkpointState/om.db-64858946-aa02-4069-b2ba-29708b38fbfe'
have different file: 000543
2023-08-16 23:36:26,948 DEBUG
[snapshot-diff-job-thread-id-1]-org.apache.ozone.rocksdiff.RocksDBCheckpointDiffer:
Processing node: 000556
2023-08-16 23:36:26,948 DEBUG
[snapshot-diff-job-thread-id-1]-org.apache.ozone.rocksdiff.RocksDBCheckpointDiffer:
No further compaction happened to the current file. Src
'/var/lib/hadoop-ozone/om/data/db.snapshots/checkpointState/om.db-0d31acca-64bf-4662-a36c-5ce239023cde'
and dest
'/var/lib/hadoop-ozone/om/data/db.snapshots/checkpointState/om.db-64858946-aa02-4069-b2ba-29708b38fbfe'
have different file: 000556
2023-08-16 23:36:26,948 DEBUG
[snapshot-diff-job-thread-id-1]-org.apache.ozone.rocksdiff.RocksDBCheckpointDiffer:
Processing node: 000568
2023-08-16 23:36:26,948 DEBUG
[snapshot-diff-job-thread-id-1]-org.apache.ozone.rocksdiff.RocksDBCheckpointDiffer:
No further compaction happened to the current file. Src
'/var/lib/hadoop-ozone/om/data/db.snapshots/checkpointState/om.db-0d31acca-64bf-4662-a36c-5ce239023cde'
and dest
'/var/lib/hadoop-ozone/om/data/db.snapshots/checkpointState/om.db-64858946-aa02-4069-b2ba-29708b38fbfe'
have different file: 000568
2023-08-16 23:36:26,948 DEBUG
[snapshot-diff-job-thread-id-1]-org.apache.ozone.rocksdiff.RocksDBCheckpointDiffer:
Traversal level: 35. Current level has 2 nodes.
2023-08-16 23:36:26,948 DEBUG
[snapshot-diff-job-thread-id-1]-org.apache.ozone.rocksdiff.RocksDBCheckpointDiffer:
Processing node: 000516
2023-08-16 23:36:26,948 DEBUG
[snapshot-diff-job-thread-id-1]-org.apache.ozone.rocksdiff.RocksDBCheckpointDiffer:
No further compaction happened to the current file. Src
'/var/lib/hadoop-ozone/om/data/db.snapshots/checkpointState/om.db-0d31acca-64bf-4662-a36c-5ce239023cde'
and dest
'/var/lib/hadoop-ozone/om/data/db.snapshots/checkpointState/om.db-64858946-aa02-4069-b2ba-29708b38fbfe'
have different file: 000516
2023-08-16 23:36:26,948 DEBUG
[snapshot-diff-job-thread-id-1]-org.apache.ozone.rocksdiff.RocksDBCheckpointDiffer:
Processing node: 000471
2023-08-16 23:36:26,948 DEBUG
[snapshot-diff-job-thread-id-1]-org.apache.ozone.rocksdiff.RocksDBCheckpointDiffer:
Current node's snapshot generation '4968' reached destination snapshot's
'5370'. Src
'/var/lib/hadoop-ozone/om/data/db.snapshots/checkpointState/om.db-0d31acca-64bf-4662-a36c-5ce239023cde'
and dest
'/var/lib/hadoop-ozone/om/data/db.snapshots/checkpointState/om.db-64858946-aa02-4069-b2ba-29708b38fbfe'
have different SST file: '000471'
{code}
6. When it looked for the file node in Active DB dir and SST backup file, it
couldn't find it and threw the exception.
For reference DAG image:
https://issues.apache.org/jira/secure/attachment/13062229/HDDS-8940_Compaction_Dag_1.png
was (Author: JIRAUSER297350):
We can apply the above theory in following example:
1. Snapshot: *cm-23-1692026000702-1* was created with snapshotId:
*64858946-aa02-4069-b2ba-29708b38fbfe*
{code}
[root@quasar-dbrrnu-2 ~]# grep cm-23-1692026000702-1
/var/log/hadoop-ozone/ozone-om.log
2023-08-14 15:22:27,648 INFO [OM StateMachine ApplyTransaction Thread -
0]-org.apache.hadoop.ozone.om.request.snapshot.OMSnapshotCreateRequest: Created
snapshot: 'cm-23-1692026000702-1' with snapshotId:
'64858946-aa02-4069-b2ba-29708b38fbfe' under path 's3v/bucket870ifflobs'
2023-08-14 15:22:27,683 INFO
[OMDoubleBufferFlushThread]-org.apache.hadoop.ozone.om.OmSnapshotManager:
Created checkpoint :
/var/lib/hadoop-ozone/om/data/db.snapshots/checkpointState/om.db-64858946-aa02-4069-b2ba-29708b38fbfe
for snapshot cm-23-1692026000702-1
{code}
2. *SSTFilteringService* removed 000471 form the checkpoint dir.
{code}
2023-08-14 15:23:22,749 INFO
[SstFilteringService#0]-org.apache.hadoop.hdds.utils.db.RocksDatabase: Deleting
sst file /000471.sst corresponding to column family fileTable from db:
/var/lib/hadoop-ozone/om/data/db.snapshots/checkpointState/om.db-64858946-aa02-4069-b2ba-29708b38fbfe
2023-08-14 15:23:22,751 INFO
[SstFilteringService#0]-org.apache.hadoop.hdds.utils.db.managed.ManagedRocksObjectUtils:
Waited for 0 milliseconds for file
/var/lib/hadoop-ozone/om/data/db.snapshots/checkpointState/om.db-64858946-aa02-4069-b2ba-29708b38fbfe/000471.sst
deletion.
{code}
3. 000471 was also removed by SST pruning service from SST backup dir.
{code}
2023-08-14 15:44:00,141 INFO
[CompactionDagPruningService]-org.apache.ozone.rocksdiff.RocksDBCheckpointDiffer:
Removing SST files: [000256, 000210, 000139, 000456, 000411, 000417, 000263,
000141, 000261, 000322, 000520, 000564, 000442, 000486, 000364, 000242, 000567,
000126, 000203, 000246, 000367, 000521, 000407, 000208, 000405, 000449, 000206,
000569, 000171, 000252, 000296, 000373, 000370, 000311, 000355, 000553, 000475,
000155, 000276, 000110, 000550, 000314, 000158, 000517, 000514, 000361, 000283,
000480, 000421, 000101, 000189, 000464, 000540, 000424, 000467, 000308, 000109,
000108, 000504, 000228, 000305, 000349, 000193, 000391, 000471, 000470, 000150]
as part of SST file pruning.
{code}
4. Snapshot: *cm-tmp-82f80694-7d3b-43d9-9430-393486b250c4* with snapshotId:
*0d31acca-64bf-4662-a36c-5ce239023cde*
{code}
2023-08-16 23:36:20,465 INFO [OM StateMachine ApplyTransaction Thread -
0]-org.apache.hadoop.ozone.om.request.snapshot.OMSnapshotCreateRequest: Created
snapshot: 'cm-tmp-82f80694-7d3b-43d9-9430-393486b250c4' with snapshotId:
'0d31acca-64bf-4662-a36c-5ce239023cde' under path 's3v/bucket870ifflobs'
{code}
5. Node: 000471 was added to the traversal as child node of the node: 000543
and added to the diff because it was taken before the node: 000543.
{code}
2023-08-16 23:36:26,948 DEBUG
[snapshot-diff-job-thread-id-1]-org.apache.ozone.rocksdiff.RocksDBCheckpointDiffer:
Processing node: 000543
2023-08-16 23:36:26,948 DEBUG
[snapshot-diff-job-thread-id-1]-org.apache.ozone.rocksdiff.RocksDBCheckpointDiffer:
No further compaction happened to the current file. Src
'/var/lib/hadoop-ozone/om/data/db.snapshots/checkpointState/om.db-0d31acca-64bf-4662-a36c-5ce239023cde'
and dest
'/var/lib/hadoop-ozone/om/data/db.snapshots/checkpointState/om.db-64858946-aa02-4069-b2ba-29708b38fbfe'
have different file: 000543
2023-08-16 23:36:26,948 DEBUG
[snapshot-diff-job-thread-id-1]-org.apache.ozone.rocksdiff.RocksDBCheckpointDiffer:
Processing node: 000556
2023-08-16 23:36:26,948 DEBUG
[snapshot-diff-job-thread-id-1]-org.apache.ozone.rocksdiff.RocksDBCheckpointDiffer:
No further compaction happened to the current file. Src
'/var/lib/hadoop-ozone/om/data/db.snapshots/checkpointState/om.db-0d31acca-64bf-4662-a36c-5ce239023cde'
and dest
'/var/lib/hadoop-ozone/om/data/db.snapshots/checkpointState/om.db-64858946-aa02-4069-b2ba-29708b38fbfe'
have different file: 000556
2023-08-16 23:36:26,948 DEBUG
[snapshot-diff-job-thread-id-1]-org.apache.ozone.rocksdiff.RocksDBCheckpointDiffer:
Processing node: 000568
2023-08-16 23:36:26,948 DEBUG
[snapshot-diff-job-thread-id-1]-org.apache.ozone.rocksdiff.RocksDBCheckpointDiffer:
No further compaction happened to the current file. Src
'/var/lib/hadoop-ozone/om/data/db.snapshots/checkpointState/om.db-0d31acca-64bf-4662-a36c-5ce239023cde'
and dest
'/var/lib/hadoop-ozone/om/data/db.snapshots/checkpointState/om.db-64858946-aa02-4069-b2ba-29708b38fbfe'
have different file: 000568
2023-08-16 23:36:26,948 DEBUG
[snapshot-diff-job-thread-id-1]-org.apache.ozone.rocksdiff.RocksDBCheckpointDiffer:
Traversal level: 35. Current level has 2 nodes.
2023-08-16 23:36:26,948 DEBUG
[snapshot-diff-job-thread-id-1]-org.apache.ozone.rocksdiff.RocksDBCheckpointDiffer:
Processing node: 000516
2023-08-16 23:36:26,948 DEBUG
[snapshot-diff-job-thread-id-1]-org.apache.ozone.rocksdiff.RocksDBCheckpointDiffer:
No further compaction happened to the current file. Src
'/var/lib/hadoop-ozone/om/data/db.snapshots/checkpointState/om.db-0d31acca-64bf-4662-a36c-5ce239023cde'
and dest
'/var/lib/hadoop-ozone/om/data/db.snapshots/checkpointState/om.db-64858946-aa02-4069-b2ba-29708b38fbfe'
have different file: 000516
2023-08-16 23:36:26,948 DEBUG
[snapshot-diff-job-thread-id-1]-org.apache.ozone.rocksdiff.RocksDBCheckpointDiffer:
Processing node: 000471
2023-08-16 23:36:26,948 DEBUG
[snapshot-diff-job-thread-id-1]-org.apache.ozone.rocksdiff.RocksDBCheckpointDiffer:
Current node's snapshot generation '4968' reached destination snapshot's
'5370'. Src
'/var/lib/hadoop-ozone/om/data/db.snapshots/checkpointState/om.db-0d31acca-64bf-4662-a36c-5ce239023cde'
and dest
'/var/lib/hadoop-ozone/om/data/db.snapshots/checkpointState/om.db-64858946-aa02-4069-b2ba-29708b38fbfe'
have different SST file: '000471'
{code}
6. When it looked for the file node in Active DB dir and SST backup file, it
couldn't find it and threw the exception.
For reference DAG image:
https://issues.apache.org/jira/secure/attachment/13062229/HDDS-8940_Compaction_Dag_1.png
> SST files are missing on optimized snapDiff path.
> -------------------------------------------------
>
> Key: HDDS-8940
> URL: https://issues.apache.org/jira/browse/HDDS-8940
> Project: Apache Ozone
> Issue Type: Sub-task
> Reporter: Hemant Kumar
> Assignee: Hemant Kumar
> Priority: Major
> Attachments: HDDS-8940_Compaction_Dag.png,
> HDDS-8940_Compaction_Dag_1.png, HDDS-8940_Compaction_Dag_2.png, example.png
>
>
> While running snapDiff, we are seeing SST files missing on optimized snapDiff
> path.
> {code}
> 2023-06-23 19:59:16,323 [snapshot-diff-job-thread-id-14] ERROR
> org.apache.hadoop.ozone.om.snapshot.SnapshotDiffManager: Caught checked
> exception during diff report generation for volume: volume1 bucket: bucket1,
> fromSnapshot: alma2 and toSnapshot:
> cm-tmp-0ae3d532-237d-4df2-83f9-4844d153521e
> java.io.FileNotFoundException: Can't find SST file: 010788.sst
> at
> org.apache.ozone.rocksdiff.RocksDBCheckpointDiffer.getAbsoluteSstFilePath(RocksDBCheckpointDiffer.java:654)
> at
> org.apache.ozone.rocksdiff.RocksDBCheckpointDiffer.filterRelevantSstFilesFullPath(RocksDBCheckpointDiffer.java:949)
> at
> org.apache.ozone.rocksdiff.RocksDBCheckpointDiffer.getSSTDiffList(RocksDBCheckpointDiffer.java:933)
> at
> org.apache.ozone.rocksdiff.RocksDBCheckpointDiffer.getSSTDiffListWithFullPath(RocksDBCheckpointDiffer.java:868)
> at
> org.apache.hadoop.ozone.om.snapshot.SnapshotDiffManager.getDeltaFiles(SnapshotDiffManager.java:929)
> at
> org.apache.hadoop.ozone.om.snapshot.SnapshotDiffManager.getDeltaFilesAndDiffKeysToObjectIdToKeyMap(SnapshotDiffManager.java:793)
> at
> org.apache.hadoop.ozone.om.snapshot.SnapshotDiffManager.generateSnapshotDiffReport(SnapshotDiffManager.java:721)
> at
> org.apache.hadoop.ozone.om.snapshot.SnapshotDiffManager.lambda$0(SnapshotDiffManager.java:565)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]