[ 
https://issues.apache.org/jira/browse/HADOOP-14277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15956942#comment-15956942
 ] 

Eric Badger commented on HADOOP-14277:
--------------------------------------

bq. NN launches a new trash emptier thread when it gets started in 
startActiveServices, so I used similar way in this test to verify 
TrashPolicy#createCheckpoint and TrashPolicy#deleteCheckpoint can be called as 
designed, and fs.trash.interval can be set correctly during a restart.
My issue with the test is that it creates {{AuditableTrashPolicy}} which 
extends {{TrashPolicy}}. {{TrashPolicy}} doesn't implement most of its 
important functions and overriding them in {{AuditableTrashPolicy}} just means 
that we'll be testing that the implementation of {{AuditableTrashPolicy}} is 
valid. However, {{AuditableTrashPolicy}} is a test class that isn't used in 
production and so this test is really only testing a test implementation, not a 
production implementation

bq. Do you have the log of the jenkins job where this test failed?
Here is the output of the test
{noformat}
Create a checkpoint, current number of checkpoints 1
Create a checkpoint, current number of checkpoints 2
Create a checkpoint, current number of checkpoints 3
Create a checkpoint, current number of checkpoints 4
Create a checkpoint, current number of checkpoints 5
Delete a checkpoint, current number of checkpoints 4
Delete a checkpoint, current number of checkpoints 3
{noformat}

bq. The test code doesn't really do anything when deleting a checkpoint other 
than adding a count in AuditableCheckpoints, so I am wondering why this could 
be flaky.
It doesn't do much, but it's dependent on things running smoothly. We're 
multithreaded here with the {{emptierThread}} and can't assume that things will 
be scheduled immediately, nor can we assume that they will run in some 
arbitrarily small amount of time. If a machine is heavily loaded, a thread 
might not get scheduled quickly or might get interrupted frequently. If the 
{{emptierThread}} waits long enough or is slowed down for whatever reason, it 
might take longer than 20ms to instantiate the {{AuditableTrashPolicy}} object 
and delete the checkpoint. This realistically should never take this long, 
which is why this is not a frequent failure. However, it is a race based on the 
fact that this test requires the execution of the entire {{empiterThread}} to 
take less than 120ms and that if it doesn't that the test will fail. 

> TestTrash.testTrashRestarts is flaky
> ------------------------------------
>
>                 Key: HADOOP-14277
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14277
>             Project: Hadoop Common
>          Issue Type: Bug
>            Reporter: Eric Badger
>
> {noformat}
> junit.framework.AssertionFailedError: Expected num of checkpoints is 2, but 
> actual is 3 expected:<2> but was:<3>
>       at junit.framework.Assert.fail(Assert.java:57)
>       at junit.framework.Assert.failNotEquals(Assert.java:329)
>       at junit.framework.Assert.assertEquals(Assert.java:78)
>       at junit.framework.Assert.assertEquals(Assert.java:234)
>       at junit.framework.TestCase.assertEquals(TestCase.java:401)
>       at 
> org.apache.hadoop.fs.TestTrash.verifyAuditableTrashEmptier(TestTrash.java:892)
>       at org.apache.hadoop.fs.TestTrash.testTrashRestarts(TestTrash.java:593)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to