[jira] [Commented] (HDDS-5819) Intermittent failure in TestRootedOzoneFileSystem#testRenameToTrashEnabled

Ethan Rose (Jira) Thu, 21 Apr 2022 14:34:07 -0700


    [ 
https://issues.apache.org/jira/browse/HDDS-5819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17526092#comment-17526092
 ]


Ethan Rose commented on HDDS-5819:
----------------------------------

It looks like we need to account for the possibility of the checkpointing 
operation happening while the test runs. Looks like what’s happening is:
1. The test creates <bucket>/.Trash/<username>/Current/key
2. The trash checkpoint interval goes off, so 
<bucket>/.Trash/<username>/Current/key is moved to 
<bucket>/.Trash/<username>/<checkpoint time>/key
3. The test checks that <bucket>/.Trash/<username>/Current/ and/or 
<bucket>/.Trash/<username>/Current/key exist, and fails.
We could try to change the trash checkpoint interval configs for that test 
only, but that requires a cluster restart. This may be a good option if we want 
to use a mini ozone cluster provider for each test instead of reusing the same 
cluster, but given the number of tests * parameterization I’m thinking that 
might be too many clusters at once. Another easier option could be to just 
check the checkpoints that may exist to find the key. Something like this:
{code}
trashKeyFound = ofs.exists(trashPath)
if !trashKeyFound:
    for checkpoint in ofs.listStatus(userTrashRoot):
        checkpointTrashPath = new Path(userTrash, dir.checkpoint.getName(), key)
        if ofs.exists(checkpointTrashPath):
            trashKeyFound = true
            break

Assert.assertTrue(trashKeyFound)
{code}
The time to live for a checkpoint is set to 3 seconds (fs.trash.interval, see 
TrashPolicyOzone#deleteCheckpoint). This means the test has 3 seconds between 
between calling moveToTrash on the key and finding which checkpoint it is 
currently in before it is gone forever. This should be plenty of time to find 
the key in one of the checkpoints.

> Intermittent failure in TestRootedOzoneFileSystem#testRenameToTrashEnabled
> --------------------------------------------------------------------------
>
>                 Key: HDDS-5819
>                 URL: https://issues.apache.org/jira/browse/HDDS-5819
>             Project: Apache Ozone
>          Issue Type: Sub-task
>            Reporter: Ethan Rose
>            Assignee: Keyi Song
>            Priority: Major
>         Attachments: it-filesystem-hdds.zip
>
>
> TestRootedOzoneFileSystem reuses the same MiniOzoneCluster for all runs. A 
> cascading series of failures was observed in this CI run: 
> [https://github.com/apache/ozone/runs/3792440274]
> It looks like testRenameToTrashEnabled was the original failure that caused 
> the others. Looking in the logs (attached in the zip file) there are numerous 
> volume and bucket request errors that may have resulted in this.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDDS-5819) Intermittent failure in TestRootedOzoneFileSystem#testRenameToTrashEnabled

Reply via email to