[ 
https://issues.apache.org/jira/browse/HDDS-8453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17763222#comment-17763222
 ] 

Sadanand Shenoy edited comment on HDDS-8453 at 9/9/23 5:37 AM:
---------------------------------------------------------------

This is what the test does roughly:
 # Create a bunch of keys
 # Take a snapshot
 # Delete some of the keys created in Step 1 (3 keys to be precise)

{code:java}
       // Case-1) Delete 3 Files directly.
       for (int i = 0; i < 3; i++) {
       Path path = new Path(root, "testKey" + i);
       fs.delete(path, true);
       } {code}
4. Assert that deleted keys are present in Active DB's Deleted table as deleted 
keys go to deletedTable on deletion
{code:java}
    assertTableRowCount(deletedKeyTable, 3);
{code}
KeyDeletingService shouldn't process the keys in deleteKeyTable as they are 
present in the Snapshot's KeyTable which is true in this case . The below code 
does this skip if the key exists in the snapshot.
{code:java}
                OmKeyInfo omKeyInfo = prevKeyTable.get(prevKeyTableDBKey);
                // When key is deleted it is no longer in keyTable, we also
                // have to check deletedTable of previous snapshot
                RepeatedOmKeyInfo delOmKeyInfo =
                    prevDeletedTable.get(prevDelTableDBKey);
                if (versionExistsInPreviousSnapshot(omKeyInfo,
                    info, delOmKeyInfo)) {
                  // If the infoList size is 1, there is nothing to split.
                  // We either delete it or skip it.
                  if (!(infoList.getOmKeyInfoList().size() == 1)) {
                    notReclaimableKeyInfo.addOmKeyInfo(info);
                  }
                  continue;
                }
              }
{code}
Now coming to the problem :
we are obtaining omKeyInfo based on *prevKeyTableDBKey* and it is constructed 
as below
{code:java}
                if (prevKeyTableDBKey == null &&
                    bucketInfo.getBucketLayout().isFileSystemOptimized()) {
                  long volumeId = getVolumeId(info.getVolumeName());
                  prevKeyTableDBKey = getOzonePathKey(volumeId,
                      bucketInfo.getObjectID(),
                      info.getParentObjectID(),
                      info.getKeyName());
{code}
In case of an FSO key that in enclosed inside a *directory* (say 
vol/buck/dir/key) the *keyName* is dir/key but *fileName* is the leaf node i.e 
key
Since prevKeyTableDBKey is constructed using keyName , in case of such a key
_prevKeyTableDBKey = volID/buckID/parentDirID/dir/key_ instead of 
_volID/buckID/parentDirID/key_ which is what is stored in FSO KeyTable 
(FileTable) , This causes the prevTableDBKey to not pass 
versionExistsInPreviousSnapshot check and instead the KDS will process it for 
further deletion. Simple fix is to use fileName here. [See fix 
|https://github.com/apache/ozone/commit/447b6c0633820505795858289e42ca6f924b208e]

Validated this on a cluster to repro block delete:

 
{code:java}
[ ~]# ozone fs -mkdir ofs://ozone1/vol1/buck3/dir
[ ~]# ozone fs -copyFromLocal /var/log/test.log  
ofs://ozone1/vol1/buck3/dir/file
[ ~]# ozone sh snapshot create vol1/buck3 snap1
[ ~]# ozone fs -rm   -skipTrash ofs://ozone1/vol1/buck3/dir/file
[~]# ozone admin container close 1 // close the container that pertains to the 
key and wait for a while for block cleanup
[~]# ozone fs -cat ofs://ozone1/vol1/buck3/.snapshot/snap1/dir/file
 ReplicationConfig: STANDALONE/THREE, State:ALLOCATED, leaderId:, 
CreationTimestamp2023-09-08T18:24:34.156Z[UTC]].
23/09/08 18:31:08 WARN storage.ContainerProtocolCalls: Failed to get block 
#111677748019200003 in container #3 from ed6c839e-085a-4b51-bb70-fd2e9fceccfb; 
will try another datanode.
org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: 
Unable to find the block. BlockID : conID: 3 locID: 111677748019200003 bcsId: 8
{code}


was (Author: sadanand_shenoy):
This is what the test does roughly:
 # Create a bunch of keys
 # Take a snapshot
 # Delete some of the keys created in Step 1 (3 keys to be precise)

{code:java}
       // Case-1) Delete 3 Files directly.
       for (int i = 0; i < 3; i++) {
       Path path = new Path(root, "testKey" + i);
       fs.delete(path, true);
       } {code}
4. Assert that deleted keys are present in Active DB's Deleted table as deleted 
keys go to deletedTable on deletion
{code:java}
    assertTableRowCount(deletedKeyTable, 3);
{code}
KeyDeletingService shouldn't process the keys in deleteKeyTable as they are 
present in the Snapshot's KeyTable which is true in this case . The below code 
does this skip if the key exists in the snapshot.
{code:java}
                OmKeyInfo omKeyInfo = prevKeyTable.get(prevKeyTableDBKey);
                // When key is deleted it is no longer in keyTable, we also
                // have to check deletedTable of previous snapshot
                RepeatedOmKeyInfo delOmKeyInfo =
                    prevDeletedTable.get(prevDelTableDBKey);
                if (versionExistsInPreviousSnapshot(omKeyInfo,
                    info, delOmKeyInfo)) {
                  // If the infoList size is 1, there is nothing to split.
                  // We either delete it or skip it.
                  if (!(infoList.getOmKeyInfoList().size() == 1)) {
                    notReclaimableKeyInfo.addOmKeyInfo(info);
                  }
                  continue;
                }
              }
{code}
Now coming to the problem :
we are obtaining omKeyInfo based on *prevKeyTableDBKey* and it is constructed 
as below
{code:java}
                if (prevKeyTableDBKey == null &&
                    bucketInfo.getBucketLayout().isFileSystemOptimized()) {
                  long volumeId = getVolumeId(info.getVolumeName());
                  prevKeyTableDBKey = getOzonePathKey(volumeId,
                      bucketInfo.getObjectID(),
                      info.getParentObjectID(),
                      info.getKeyName());
{code}
In case of an FSO key that in enclosed inside a *directory* (say 
vol/buck/dir/key) the *keyName* is dir/key but *fileName* is the leaf node i.e 
key
Since prevKeyTableDBKey is constructed using keyName , in case of such a key
_prevKeyTableDBKey = volID/buckID/dir/key_ instead of 
_volID/buckID/parentDirID/key_ which is what is stored in FSO KeyTable 
(FileTable) , This causes the prevTableDBKey to not pass 
versionExistsInPreviousSnapshot check and instead the KDS will process it for 
further deletion. Simple fix is to use fileName here. [See fix 
|https://github.com/apache/ozone/commit/447b6c0633820505795858289e42ca6f924b208e]

Validated this on a cluster to repro block delete:

 
{code:java}
[ ~]# ozone fs -mkdir ofs://ozone1/vol1/buck3/dir
[ ~]# ozone fs -copyFromLocal /var/log/test.log  
ofs://ozone1/vol1/buck3/dir/file
[ ~]# ozone sh snapshot create vol1/buck3 snap1
[ ~]# ozone fs -rm   -skipTrash ofs://ozone1/vol1/buck3/dir/file
[~]# ozone admin container close 1 // close the container that pertains to the 
key and wait for a while for block cleanup
[~]# ozone fs -cat ofs://ozone1/vol1/buck3/.snapshot/snap1/dir/file
 ReplicationConfig: STANDALONE/THREE, State:ALLOCATED, leaderId:, 
CreationTimestamp2023-09-08T18:24:34.156Z[UTC]].
23/09/08 18:31:08 WARN storage.ContainerProtocolCalls: Failed to get block 
#111677748019200003 in container #3 from ed6c839e-085a-4b51-bb70-fd2e9fceccfb; 
will try another datanode.
org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: 
Unable to find the block. BlockID : conID: 3 locID: 111677748019200003 bcsId: 8
{code}

> Intermittent timeout in TestDirectoryDeletingServiceWithFSO
> -----------------------------------------------------------
>
>                 Key: HDDS-8453
>                 URL: https://issues.apache.org/jira/browse/HDDS-8453
>             Project: Apache Ozone
>          Issue Type: Sub-task
>    Affects Versions: 1.4.0
>            Reporter: Attila Doroszlai
>            Assignee: Devesh Kumar Singh
>            Priority: Major
>
> {code}
> Tests run: 5, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 279.448 s <<< 
> FAILURE! - in org.apache.hadoop.fs.ozone.TestDirectoryDeletingServiceWithFSO
> org.apache.hadoop.fs.ozone.TestDirectoryDeletingServiceWithFSO.testDirDeletedTableCleanUpForSnapshot
>   Time elapsed: 122.491 s  <<< ERROR!
> ...
>   at org.apache.ozone.test.GenericTestUtils.waitFor(GenericTestUtils.java:231)
>   at 
> org.apache.hadoop.fs.ozone.TestDirectoryDeletingServiceWithFSO.assertTableRowCount(TestDirectoryDeletingServiceWithFSO.java:509)
>   at 
> org.apache.hadoop.fs.ozone.TestDirectoryDeletingServiceWithFSO.testDirDeletedTableCleanUpForSnapshot(TestDirectoryDeletingServiceWithFSO.java:470)
> org.apache.hadoop.fs.ozone.TestDirectoryDeletingServiceWithFSO.testDeleteWithLargeSubPathsThanBatchSize
>   Time elapsed: 120.321 s  <<< ERROR!
> ...
>   at org.apache.ozone.test.GenericTestUtils.waitFor(GenericTestUtils.java:231)
>   at 
> org.apache.hadoop.fs.ozone.TestDirectoryDeletingServiceWithFSO.assertTableRowCount(TestDirectoryDeletingServiceWithFSO.java:509)
>   at 
> org.apache.hadoop.fs.ozone.TestDirectoryDeletingServiceWithFSO.testDeleteWithLargeSubPathsThanBatchSize(TestDirectoryDeletingServiceWithFSO.java:218)
> {code}
> * 
> https://github.com/adoroszlai/ozone-build-results/blob/master/2023/04/14/21548/it-filesystem/hadoop-ozone/integration-test/org.apache.hadoop.fs.ozone.TestDirectoryDeletingServiceWithFSO.txt
> * 
> https://github.com/adoroszlai/ozone-build-results/blob/master/2023/04/18/21638/it-filesystem/hadoop-ozone/integration-test/org.apache.hadoop.fs.ozone.TestDirectoryDeletingServiceWithFSO.txt



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to