[
https://issues.apache.org/jira/browse/HDDS-5819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17526092#comment-17526092
]
Ethan Rose commented on HDDS-5819:
----------------------------------
It looks like we need to account for the possibility of the checkpointing
operation happening while the test runs. Looks like what’s happening is:
1. The test creates <bucket>/.Trash/<username>/Current/key
2. The trash checkpoint interval goes off, so
<bucket>/.Trash/<username>/Current/key is moved to
<bucket>/.Trash/<username>/<checkpoint time>/key
3. The test checks that <bucket>/.Trash/<username>/Current/ and/or
<bucket>/.Trash/<username>/Current/key exist, and fails.
We could try to change the trash checkpoint interval configs for that test
only, but that requires a cluster restart. This may be a good option if we want
to use a mini ozone cluster provider for each test instead of reusing the same
cluster, but given the number of tests * parameterization I’m thinking that
might be too many clusters at once. Another easier option could be to just
check the checkpoints that may exist to find the key. Something like this:
{code}
trashKeyFound = ofs.exists(trashPath)
if !trashKeyFound:
for checkpoint in ofs.listStatus(userTrashRoot):
checkpointTrashPath = new Path(userTrash, dir.checkpoint.getName(), key)
if ofs.exists(checkpointTrashPath):
trashKeyFound = true
break
Assert.assertTrue(trashKeyFound)
{code}
The time to live for a checkpoint is set to 3 seconds (fs.trash.interval, see
TrashPolicyOzone#deleteCheckpoint). This means the test has 3 seconds between
between calling moveToTrash on the key and finding which checkpoint it is
currently in before it is gone forever. This should be plenty of time to find
the key in one of the checkpoints.
> Intermittent failure in TestRootedOzoneFileSystem#testRenameToTrashEnabled
> --------------------------------------------------------------------------
>
> Key: HDDS-5819
> URL: https://issues.apache.org/jira/browse/HDDS-5819
> Project: Apache Ozone
> Issue Type: Sub-task
> Reporter: Ethan Rose
> Assignee: Keyi Song
> Priority: Major
> Attachments: it-filesystem-hdds.zip
>
>
> TestRootedOzoneFileSystem reuses the same MiniOzoneCluster for all runs. A
> cascading series of failures was observed in this CI run:
> [https://github.com/apache/ozone/runs/3792440274]
> It looks like testRenameToTrashEnabled was the original failure that caused
> the others. Looking in the logs (attached in the zip file) there are numerous
> volume and bucket request errors that may have resulted in this.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]