[
https://issues.apache.org/jira/browse/HDDS-8453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17763222#comment-17763222
]
Sadanand Shenoy edited comment on HDDS-8453 at 9/9/23 5:37 AM:
---------------------------------------------------------------
This is what the test does roughly:
# Create a bunch of keys
# Take a snapshot
# Delete some of the keys created in Step 1 (3 keys to be precise)
{code:java}
// Case-1) Delete 3 Files directly.
for (int i = 0; i < 3; i++) {
Path path = new Path(root, "testKey" + i);
fs.delete(path, true);
} {code}
4. Assert that deleted keys are present in Active DB's Deleted table as deleted
keys go to deletedTable on deletion
{code:java}
assertTableRowCount(deletedKeyTable, 3);
{code}
KeyDeletingService shouldn't process the keys in deleteKeyTable as they are
present in the Snapshot's KeyTable which is true in this case . The below code
does this skip if the key exists in the snapshot.
{code:java}
OmKeyInfo omKeyInfo = prevKeyTable.get(prevKeyTableDBKey);
// When key is deleted it is no longer in keyTable, we also
// have to check deletedTable of previous snapshot
RepeatedOmKeyInfo delOmKeyInfo =
prevDeletedTable.get(prevDelTableDBKey);
if (versionExistsInPreviousSnapshot(omKeyInfo,
info, delOmKeyInfo)) {
// If the infoList size is 1, there is nothing to split.
// We either delete it or skip it.
if (!(infoList.getOmKeyInfoList().size() == 1)) {
notReclaimableKeyInfo.addOmKeyInfo(info);
}
continue;
}
}
{code}
Now coming to the problem :
we are obtaining omKeyInfo based on *prevKeyTableDBKey* and it is constructed
as below
{code:java}
if (prevKeyTableDBKey == null &&
bucketInfo.getBucketLayout().isFileSystemOptimized()) {
long volumeId = getVolumeId(info.getVolumeName());
prevKeyTableDBKey = getOzonePathKey(volumeId,
bucketInfo.getObjectID(),
info.getParentObjectID(),
info.getKeyName());
{code}
In case of an FSO key that in enclosed inside a *directory* (say
vol/buck/dir/key) the *keyName* is dir/key but *fileName* is the leaf node i.e
key
Since prevKeyTableDBKey is constructed using keyName , in case of such a key
_prevKeyTableDBKey = volID/buckID/parentDirID/dir/key_ instead of
_volID/buckID/parentDirID/key_ which is what is stored in FSO KeyTable
(FileTable) , This causes the prevTableDBKey to not pass
versionExistsInPreviousSnapshot check and instead the KDS will process it for
further deletion. Simple fix is to use fileName here. [See fix
|https://github.com/apache/ozone/commit/447b6c0633820505795858289e42ca6f924b208e]
Validated this on a cluster to repro block delete:
{code:java}
[ ~]# ozone fs -mkdir ofs://ozone1/vol1/buck3/dir
[ ~]# ozone fs -copyFromLocal /var/log/test.log
ofs://ozone1/vol1/buck3/dir/file
[ ~]# ozone sh snapshot create vol1/buck3 snap1
[ ~]# ozone fs -rm -skipTrash ofs://ozone1/vol1/buck3/dir/file
[~]# ozone admin container close 1 // close the container that pertains to the
key and wait for a while for block cleanup
[~]# ozone fs -cat ofs://ozone1/vol1/buck3/.snapshot/snap1/dir/file
ReplicationConfig: STANDALONE/THREE, State:ALLOCATED, leaderId:,
CreationTimestamp2023-09-08T18:24:34.156Z[UTC]].
23/09/08 18:31:08 WARN storage.ContainerProtocolCalls: Failed to get block
#111677748019200003 in container #3 from ed6c839e-085a-4b51-bb70-fd2e9fceccfb;
will try another datanode.
org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException:
Unable to find the block. BlockID : conID: 3 locID: 111677748019200003 bcsId: 8
{code}
was (Author: sadanand_shenoy):
This is what the test does roughly:
# Create a bunch of keys
# Take a snapshot
# Delete some of the keys created in Step 1 (3 keys to be precise)
{code:java}
// Case-1) Delete 3 Files directly.
for (int i = 0; i < 3; i++) {
Path path = new Path(root, "testKey" + i);
fs.delete(path, true);
} {code}
4. Assert that deleted keys are present in Active DB's Deleted table as deleted
keys go to deletedTable on deletion
{code:java}
assertTableRowCount(deletedKeyTable, 3);
{code}
KeyDeletingService shouldn't process the keys in deleteKeyTable as they are
present in the Snapshot's KeyTable which is true in this case . The below code
does this skip if the key exists in the snapshot.
{code:java}
OmKeyInfo omKeyInfo = prevKeyTable.get(prevKeyTableDBKey);
// When key is deleted it is no longer in keyTable, we also
// have to check deletedTable of previous snapshot
RepeatedOmKeyInfo delOmKeyInfo =
prevDeletedTable.get(prevDelTableDBKey);
if (versionExistsInPreviousSnapshot(omKeyInfo,
info, delOmKeyInfo)) {
// If the infoList size is 1, there is nothing to split.
// We either delete it or skip it.
if (!(infoList.getOmKeyInfoList().size() == 1)) {
notReclaimableKeyInfo.addOmKeyInfo(info);
}
continue;
}
}
{code}
Now coming to the problem :
we are obtaining omKeyInfo based on *prevKeyTableDBKey* and it is constructed
as below
{code:java}
if (prevKeyTableDBKey == null &&
bucketInfo.getBucketLayout().isFileSystemOptimized()) {
long volumeId = getVolumeId(info.getVolumeName());
prevKeyTableDBKey = getOzonePathKey(volumeId,
bucketInfo.getObjectID(),
info.getParentObjectID(),
info.getKeyName());
{code}
In case of an FSO key that in enclosed inside a *directory* (say
vol/buck/dir/key) the *keyName* is dir/key but *fileName* is the leaf node i.e
key
Since prevKeyTableDBKey is constructed using keyName , in case of such a key
_prevKeyTableDBKey = volID/buckID/dir/key_ instead of
_volID/buckID/parentDirID/key_ which is what is stored in FSO KeyTable
(FileTable) , This causes the prevTableDBKey to not pass
versionExistsInPreviousSnapshot check and instead the KDS will process it for
further deletion. Simple fix is to use fileName here. [See fix
|https://github.com/apache/ozone/commit/447b6c0633820505795858289e42ca6f924b208e]
Validated this on a cluster to repro block delete:
{code:java}
[ ~]# ozone fs -mkdir ofs://ozone1/vol1/buck3/dir
[ ~]# ozone fs -copyFromLocal /var/log/test.log
ofs://ozone1/vol1/buck3/dir/file
[ ~]# ozone sh snapshot create vol1/buck3 snap1
[ ~]# ozone fs -rm -skipTrash ofs://ozone1/vol1/buck3/dir/file
[~]# ozone admin container close 1 // close the container that pertains to the
key and wait for a while for block cleanup
[~]# ozone fs -cat ofs://ozone1/vol1/buck3/.snapshot/snap1/dir/file
ReplicationConfig: STANDALONE/THREE, State:ALLOCATED, leaderId:,
CreationTimestamp2023-09-08T18:24:34.156Z[UTC]].
23/09/08 18:31:08 WARN storage.ContainerProtocolCalls: Failed to get block
#111677748019200003 in container #3 from ed6c839e-085a-4b51-bb70-fd2e9fceccfb;
will try another datanode.
org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException:
Unable to find the block. BlockID : conID: 3 locID: 111677748019200003 bcsId: 8
{code}
> Intermittent timeout in TestDirectoryDeletingServiceWithFSO
> -----------------------------------------------------------
>
> Key: HDDS-8453
> URL: https://issues.apache.org/jira/browse/HDDS-8453
> Project: Apache Ozone
> Issue Type: Sub-task
> Affects Versions: 1.4.0
> Reporter: Attila Doroszlai
> Assignee: Devesh Kumar Singh
> Priority: Major
>
> {code}
> Tests run: 5, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 279.448 s <<<
> FAILURE! - in org.apache.hadoop.fs.ozone.TestDirectoryDeletingServiceWithFSO
> org.apache.hadoop.fs.ozone.TestDirectoryDeletingServiceWithFSO.testDirDeletedTableCleanUpForSnapshot
> Time elapsed: 122.491 s <<< ERROR!
> ...
> at org.apache.ozone.test.GenericTestUtils.waitFor(GenericTestUtils.java:231)
> at
> org.apache.hadoop.fs.ozone.TestDirectoryDeletingServiceWithFSO.assertTableRowCount(TestDirectoryDeletingServiceWithFSO.java:509)
> at
> org.apache.hadoop.fs.ozone.TestDirectoryDeletingServiceWithFSO.testDirDeletedTableCleanUpForSnapshot(TestDirectoryDeletingServiceWithFSO.java:470)
> org.apache.hadoop.fs.ozone.TestDirectoryDeletingServiceWithFSO.testDeleteWithLargeSubPathsThanBatchSize
> Time elapsed: 120.321 s <<< ERROR!
> ...
> at org.apache.ozone.test.GenericTestUtils.waitFor(GenericTestUtils.java:231)
> at
> org.apache.hadoop.fs.ozone.TestDirectoryDeletingServiceWithFSO.assertTableRowCount(TestDirectoryDeletingServiceWithFSO.java:509)
> at
> org.apache.hadoop.fs.ozone.TestDirectoryDeletingServiceWithFSO.testDeleteWithLargeSubPathsThanBatchSize(TestDirectoryDeletingServiceWithFSO.java:218)
> {code}
> *
> https://github.com/adoroszlai/ozone-build-results/blob/master/2023/04/14/21548/it-filesystem/hadoop-ozone/integration-test/org.apache.hadoop.fs.ozone.TestDirectoryDeletingServiceWithFSO.txt
> *
> https://github.com/adoroszlai/ozone-build-results/blob/master/2023/04/18/21638/it-filesystem/hadoop-ozone/integration-test/org.apache.hadoop.fs.ozone.TestDirectoryDeletingServiceWithFSO.txt
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]