[ 
https://issues.apache.org/jira/browse/HDDS-8940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17755668#comment-17755668
 ] 

Hemant Kumar edited comment on HDDS-8940 at 8/17/23 6:57 PM:
-------------------------------------------------------------

We can apply the above theory in following example:
1. Snapshot: *cm-23-1692026000702-1* was created with snapshotId: 
*64858946-aa02-4069-b2ba-29708b38fbfe*
{code}
[root@quasar-dbrrnu-2 ~]# grep cm-23-1692026000702-1 
/var/log/hadoop-ozone/ozone-om.log
2023-08-14 15:22:27,648 INFO [OM StateMachine ApplyTransaction Thread - 
0]-org.apache.hadoop.ozone.om.request.snapshot.OMSnapshotCreateRequest: Created 
snapshot: 'cm-23-1692026000702-1' with snapshotId: 
'64858946-aa02-4069-b2ba-29708b38fbfe' under path 's3v/bucket870ifflobs'
2023-08-14 15:22:27,683 INFO 
[OMDoubleBufferFlushThread]-org.apache.hadoop.ozone.om.OmSnapshotManager: 
Created checkpoint : 
/var/lib/hadoop-ozone/om/data/db.snapshots/checkpointState/om.db-64858946-aa02-4069-b2ba-29708b38fbfe
 for snapshot cm-23-1692026000702-1
{code}

2. *SSTFilteringService* removed 000471 form the checkpoint dir.
{code}
2023-08-14 15:23:22,749 INFO 
[SstFilteringService#0]-org.apache.hadoop.hdds.utils.db.RocksDatabase: Deleting 
sst file /000471.sst corresponding to column family fileTable from db: 
/var/lib/hadoop-ozone/om/data/db.snapshots/checkpointState/om.db-64858946-aa02-4069-b2ba-29708b38fbfe
2023-08-14 15:23:22,751 INFO 
[SstFilteringService#0]-org.apache.hadoop.hdds.utils.db.managed.ManagedRocksObjectUtils:
 Waited for 0 milliseconds for file 
/var/lib/hadoop-ozone/om/data/db.snapshots/checkpointState/om.db-64858946-aa02-4069-b2ba-29708b38fbfe/000471.sst
 deletion.
{code}

3. 000471 was also removed by  SST pruning service from SST backup dir.
{code}
2023-08-14 15:44:00,141 INFO 
[CompactionDagPruningService]-org.apache.ozone.rocksdiff.RocksDBCheckpointDiffer:
 Removing SST files: [000256, 000210, 000139, 000456, 000411, 000417, 000263, 
000141, 000261, 000322, 000520, 000564, 000442, 000486, 000364, 000242, 000567, 
000126, 000203, 000246, 000367, 000521, 000407, 000208, 000405, 000449, 000206, 
000569, 000171, 000252, 000296, 000373, 000370, 000311, 000355, 000553, 000475, 
000155, 000276, 000110, 000550, 000314, 000158, 000517, 000514, 000361, 000283, 
000480, 000421, 000101, 000189, 000464, 000540, 000424, 000467, 000308, 000109, 
000108, 000504, 000228, 000305, 000349, 000193, 000391, 000471, 000470, 000150] 
as part of SST file pruning.
{code}

4. Snapshot: *cm-tmp-82f80694-7d3b-43d9-9430-393486b250c4* with snapshotId: 
*0d31acca-64bf-4662-a36c-5ce239023cde*
{code}
2023-08-16 23:36:20,465 INFO [OM StateMachine ApplyTransaction Thread - 
0]-org.apache.hadoop.ozone.om.request.snapshot.OMSnapshotCreateRequest: Created 
snapshot: 'cm-tmp-82f80694-7d3b-43d9-9430-393486b250c4' with snapshotId: 
'0d31acca-64bf-4662-a36c-5ce239023cde' under path 's3v/bucket870ifflobs'
{code}

5. Node: 000471 was added to the traversal as child node of the node: 000543 
and added to the diff because it was taken before the node: 000543.
{code}
2023-08-16 23:36:26,948 DEBUG 
[snapshot-diff-job-thread-id-1]-org.apache.ozone.rocksdiff.RocksDBCheckpointDiffer:
 Processing node: 000543
2023-08-16 23:36:26,948 DEBUG 
[snapshot-diff-job-thread-id-1]-org.apache.ozone.rocksdiff.RocksDBCheckpointDiffer:
 No further compaction happened to the current file. Src 
'/var/lib/hadoop-ozone/om/data/db.snapshots/checkpointState/om.db-0d31acca-64bf-4662-a36c-5ce239023cde'
 and dest 
'/var/lib/hadoop-ozone/om/data/db.snapshots/checkpointState/om.db-64858946-aa02-4069-b2ba-29708b38fbfe'
 have different file: 000543
2023-08-16 23:36:26,948 DEBUG 
[snapshot-diff-job-thread-id-1]-org.apache.ozone.rocksdiff.RocksDBCheckpointDiffer:
 Processing node: 000556
2023-08-16 23:36:26,948 DEBUG 
[snapshot-diff-job-thread-id-1]-org.apache.ozone.rocksdiff.RocksDBCheckpointDiffer:
 No further compaction happened to the current file. Src 
'/var/lib/hadoop-ozone/om/data/db.snapshots/checkpointState/om.db-0d31acca-64bf-4662-a36c-5ce239023cde'
 and dest 
'/var/lib/hadoop-ozone/om/data/db.snapshots/checkpointState/om.db-64858946-aa02-4069-b2ba-29708b38fbfe'
 have different file: 000556
2023-08-16 23:36:26,948 DEBUG 
[snapshot-diff-job-thread-id-1]-org.apache.ozone.rocksdiff.RocksDBCheckpointDiffer:
 Processing node: 000568
2023-08-16 23:36:26,948 DEBUG 
[snapshot-diff-job-thread-id-1]-org.apache.ozone.rocksdiff.RocksDBCheckpointDiffer:
 No further compaction happened to the current file. Src 
'/var/lib/hadoop-ozone/om/data/db.snapshots/checkpointState/om.db-0d31acca-64bf-4662-a36c-5ce239023cde'
 and dest 
'/var/lib/hadoop-ozone/om/data/db.snapshots/checkpointState/om.db-64858946-aa02-4069-b2ba-29708b38fbfe'
 have different file: 000568
2023-08-16 23:36:26,948 DEBUG 
[snapshot-diff-job-thread-id-1]-org.apache.ozone.rocksdiff.RocksDBCheckpointDiffer:
 Traversal level: 35. Current level has 2 nodes.
2023-08-16 23:36:26,948 DEBUG 
[snapshot-diff-job-thread-id-1]-org.apache.ozone.rocksdiff.RocksDBCheckpointDiffer:
 Processing node: 000516
2023-08-16 23:36:26,948 DEBUG 
[snapshot-diff-job-thread-id-1]-org.apache.ozone.rocksdiff.RocksDBCheckpointDiffer:
 No further compaction happened to the current file. Src 
'/var/lib/hadoop-ozone/om/data/db.snapshots/checkpointState/om.db-0d31acca-64bf-4662-a36c-5ce239023cde'
 and dest 
'/var/lib/hadoop-ozone/om/data/db.snapshots/checkpointState/om.db-64858946-aa02-4069-b2ba-29708b38fbfe'
 have different file: 000516
2023-08-16 23:36:26,948 DEBUG 
[snapshot-diff-job-thread-id-1]-org.apache.ozone.rocksdiff.RocksDBCheckpointDiffer:
 Processing node: 000471
2023-08-16 23:36:26,948 DEBUG 
[snapshot-diff-job-thread-id-1]-org.apache.ozone.rocksdiff.RocksDBCheckpointDiffer:
 Current node's snapshot generation '4968' reached destination snapshot's 
'5370'. Src 
'/var/lib/hadoop-ozone/om/data/db.snapshots/checkpointState/om.db-0d31acca-64bf-4662-a36c-5ce239023cde'
 and dest 
'/var/lib/hadoop-ozone/om/data/db.snapshots/checkpointState/om.db-64858946-aa02-4069-b2ba-29708b38fbfe'
 have different SST file: '000471'
{code}
6. When it looked for the file node in Active DB dir and SST backup file, it 
couldn't find it and threw the exception. 

For reference DAG image: 
https://issues.apache.org/jira/secure/attachment/13062229/HDDS-8940_Compaction_Dag_1.png


was (Author: JIRAUSER297350):
We can apply the above theory in following example:
1. Snapshot: *cm-23-1692026000702-1* was created with snapshotId: 
*64858946-aa02-4069-b2ba-29708b38fbfe*
{code}
[root@quasar-dbrrnu-2 ~]# grep cm-23-1692026000702-1 
/var/log/hadoop-ozone/ozone-om.log
2023-08-14 15:22:27,648 INFO [OM StateMachine ApplyTransaction Thread - 
0]-org.apache.hadoop.ozone.om.request.snapshot.OMSnapshotCreateRequest: Created 
snapshot: 'cm-23-1692026000702-1' with snapshotId: 
'64858946-aa02-4069-b2ba-29708b38fbfe' under path 's3v/bucket870ifflobs'
2023-08-14 15:22:27,683 INFO 
[OMDoubleBufferFlushThread]-org.apache.hadoop.ozone.om.OmSnapshotManager: 
Created checkpoint : 
/var/lib/hadoop-ozone/om/data/db.snapshots/checkpointState/om.db-64858946-aa02-4069-b2ba-29708b38fbfe
 for snapshot cm-23-1692026000702-1
{code}

2. *SSTFilteringService* removed 000471 form the checkpoint dir.
{code}
2023-08-14 15:23:22,749 INFO 
[SstFilteringService#0]-org.apache.hadoop.hdds.utils.db.RocksDatabase: Deleting 
sst file /000471.sst corresponding to column family fileTable from db: 
/var/lib/hadoop-ozone/om/data/db.snapshots/checkpointState/om.db-64858946-aa02-4069-b2ba-29708b38fbfe
2023-08-14 15:23:22,751 INFO 
[SstFilteringService#0]-org.apache.hadoop.hdds.utils.db.managed.ManagedRocksObjectUtils:
 Waited for 0 milliseconds for file 
/var/lib/hadoop-ozone/om/data/db.snapshots/checkpointState/om.db-64858946-aa02-4069-b2ba-29708b38fbfe/000471.sst
 deletion.

3. 000471 was also removed by  SST pruning service from SST backup dir.
{code}
2023-08-14 15:44:00,141 INFO 
[CompactionDagPruningService]-org.apache.ozone.rocksdiff.RocksDBCheckpointDiffer:
 Removing SST files: [000256, 000210, 000139, 000456, 000411, 000417, 000263, 
000141, 000261, 000322, 000520, 000564, 000442, 000486, 000364, 000242, 000567, 
000126, 000203, 000246, 000367, 000521, 000407, 000208, 000405, 000449, 000206, 
000569, 000171, 000252, 000296, 000373, 000370, 000311, 000355, 000553, 000475, 
000155, 000276, 000110, 000550, 000314, 000158, 000517, 000514, 000361, 000283, 
000480, 000421, 000101, 000189, 000464, 000540, 000424, 000467, 000308, 000109, 
000108, 000504, 000228, 000305, 000349, 000193, 000391, 000471, 000470, 000150] 
as part of SST file pruning.
{code}

4. Snapshot: *cm-tmp-82f80694-7d3b-43d9-9430-393486b250c4* with snapshotId: 
*0d31acca-64bf-4662-a36c-5ce239023cde*
{code}
2023-08-16 23:36:20,465 INFO [OM StateMachine ApplyTransaction Thread - 
0]-org.apache.hadoop.ozone.om.request.snapshot.OMSnapshotCreateRequest: Created 
snapshot: 'cm-tmp-82f80694-7d3b-43d9-9430-393486b250c4' with snapshotId: 
'0d31acca-64bf-4662-a36c-5ce239023cde' under path 's3v/bucket870ifflobs'
{code}

5. Node: 000471 was added to the traversal as child node of the node: 000543 
and added to the diff because it was taken before the node: 000543.
{code}
2023-08-16 23:36:26,948 DEBUG 
[snapshot-diff-job-thread-id-1]-org.apache.ozone.rocksdiff.RocksDBCheckpointDiffer:
 Processing node: 000543
2023-08-16 23:36:26,948 DEBUG 
[snapshot-diff-job-thread-id-1]-org.apache.ozone.rocksdiff.RocksDBCheckpointDiffer:
 No further compaction happened to the current file. Src 
'/var/lib/hadoop-ozone/om/data/db.snapshots/checkpointState/om.db-0d31acca-64bf-4662-a36c-5ce239023cde'
 and dest 
'/var/lib/hadoop-ozone/om/data/db.snapshots/checkpointState/om.db-64858946-aa02-4069-b2ba-29708b38fbfe'
 have different file: 000543
2023-08-16 23:36:26,948 DEBUG 
[snapshot-diff-job-thread-id-1]-org.apache.ozone.rocksdiff.RocksDBCheckpointDiffer:
 Processing node: 000556
2023-08-16 23:36:26,948 DEBUG 
[snapshot-diff-job-thread-id-1]-org.apache.ozone.rocksdiff.RocksDBCheckpointDiffer:
 No further compaction happened to the current file. Src 
'/var/lib/hadoop-ozone/om/data/db.snapshots/checkpointState/om.db-0d31acca-64bf-4662-a36c-5ce239023cde'
 and dest 
'/var/lib/hadoop-ozone/om/data/db.snapshots/checkpointState/om.db-64858946-aa02-4069-b2ba-29708b38fbfe'
 have different file: 000556
2023-08-16 23:36:26,948 DEBUG 
[snapshot-diff-job-thread-id-1]-org.apache.ozone.rocksdiff.RocksDBCheckpointDiffer:
 Processing node: 000568
2023-08-16 23:36:26,948 DEBUG 
[snapshot-diff-job-thread-id-1]-org.apache.ozone.rocksdiff.RocksDBCheckpointDiffer:
 No further compaction happened to the current file. Src 
'/var/lib/hadoop-ozone/om/data/db.snapshots/checkpointState/om.db-0d31acca-64bf-4662-a36c-5ce239023cde'
 and dest 
'/var/lib/hadoop-ozone/om/data/db.snapshots/checkpointState/om.db-64858946-aa02-4069-b2ba-29708b38fbfe'
 have different file: 000568
2023-08-16 23:36:26,948 DEBUG 
[snapshot-diff-job-thread-id-1]-org.apache.ozone.rocksdiff.RocksDBCheckpointDiffer:
 Traversal level: 35. Current level has 2 nodes.
2023-08-16 23:36:26,948 DEBUG 
[snapshot-diff-job-thread-id-1]-org.apache.ozone.rocksdiff.RocksDBCheckpointDiffer:
 Processing node: 000516
2023-08-16 23:36:26,948 DEBUG 
[snapshot-diff-job-thread-id-1]-org.apache.ozone.rocksdiff.RocksDBCheckpointDiffer:
 No further compaction happened to the current file. Src 
'/var/lib/hadoop-ozone/om/data/db.snapshots/checkpointState/om.db-0d31acca-64bf-4662-a36c-5ce239023cde'
 and dest 
'/var/lib/hadoop-ozone/om/data/db.snapshots/checkpointState/om.db-64858946-aa02-4069-b2ba-29708b38fbfe'
 have different file: 000516
2023-08-16 23:36:26,948 DEBUG 
[snapshot-diff-job-thread-id-1]-org.apache.ozone.rocksdiff.RocksDBCheckpointDiffer:
 Processing node: 000471
2023-08-16 23:36:26,948 DEBUG 
[snapshot-diff-job-thread-id-1]-org.apache.ozone.rocksdiff.RocksDBCheckpointDiffer:
 Current node's snapshot generation '4968' reached destination snapshot's 
'5370'. Src 
'/var/lib/hadoop-ozone/om/data/db.snapshots/checkpointState/om.db-0d31acca-64bf-4662-a36c-5ce239023cde'
 and dest 
'/var/lib/hadoop-ozone/om/data/db.snapshots/checkpointState/om.db-64858946-aa02-4069-b2ba-29708b38fbfe'
 have different SST file: '000471'
{code}
6. When it looked for the file node in Active DB dir and SST backup file, it 
couldn't find it and threw the exception. 

For reference DAG image: 
https://issues.apache.org/jira/secure/attachment/13062229/HDDS-8940_Compaction_Dag_1.png

> SST files are missing on optimized snapDiff path.
> -------------------------------------------------
>
>                 Key: HDDS-8940
>                 URL: https://issues.apache.org/jira/browse/HDDS-8940
>             Project: Apache Ozone
>          Issue Type: Sub-task
>            Reporter: Hemant Kumar
>            Assignee: Hemant Kumar
>            Priority: Major
>         Attachments: HDDS-8940_Compaction_Dag.png, 
> HDDS-8940_Compaction_Dag_1.png, HDDS-8940_Compaction_Dag_2.png, example.png
>
>
> While running snapDiff, we are seeing SST files missing on optimized snapDiff 
> path.
> {code}
> 2023-06-23 19:59:16,323 [snapshot-diff-job-thread-id-14] ERROR 
> org.apache.hadoop.ozone.om.snapshot.SnapshotDiffManager: Caught checked 
> exception during diff report generation for volume: volume1 bucket: bucket1, 
> fromSnapshot: alma2 and toSnapshot: 
> cm-tmp-0ae3d532-237d-4df2-83f9-4844d153521e
> java.io.FileNotFoundException: Can't find SST file: 010788.sst
> at 
> org.apache.ozone.rocksdiff.RocksDBCheckpointDiffer.getAbsoluteSstFilePath(RocksDBCheckpointDiffer.java:654)
> at 
> org.apache.ozone.rocksdiff.RocksDBCheckpointDiffer.filterRelevantSstFilesFullPath(RocksDBCheckpointDiffer.java:949)
> at 
> org.apache.ozone.rocksdiff.RocksDBCheckpointDiffer.getSSTDiffList(RocksDBCheckpointDiffer.java:933)
> at 
> org.apache.ozone.rocksdiff.RocksDBCheckpointDiffer.getSSTDiffListWithFullPath(RocksDBCheckpointDiffer.java:868)
> at 
> org.apache.hadoop.ozone.om.snapshot.SnapshotDiffManager.getDeltaFiles(SnapshotDiffManager.java:929)
> at 
> org.apache.hadoop.ozone.om.snapshot.SnapshotDiffManager.getDeltaFilesAndDiffKeysToObjectIdToKeyMap(SnapshotDiffManager.java:793)
> at 
> org.apache.hadoop.ozone.om.snapshot.SnapshotDiffManager.generateSnapshotDiffReport(SnapshotDiffManager.java:721)
> at 
> org.apache.hadoop.ozone.om.snapshot.SnapshotDiffManager.lambda$0(SnapshotDiffManager.java:565)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to