[
https://issues.apache.org/jira/browse/HADOOP-14277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15956942#comment-15956942
]
Eric Badger commented on HADOOP-14277:
--------------------------------------
bq. NN launches a new trash emptier thread when it gets started in
startActiveServices, so I used similar way in this test to verify
TrashPolicy#createCheckpoint and TrashPolicy#deleteCheckpoint can be called as
designed, and fs.trash.interval can be set correctly during a restart.
My issue with the test is that it creates {{AuditableTrashPolicy}} which
extends {{TrashPolicy}}. {{TrashPolicy}} doesn't implement most of its
important functions and overriding them in {{AuditableTrashPolicy}} just means
that we'll be testing that the implementation of {{AuditableTrashPolicy}} is
valid. However, {{AuditableTrashPolicy}} is a test class that isn't used in
production and so this test is really only testing a test implementation, not a
production implementation
bq. Do you have the log of the jenkins job where this test failed?
Here is the output of the test
{noformat}
Create a checkpoint, current number of checkpoints 1
Create a checkpoint, current number of checkpoints 2
Create a checkpoint, current number of checkpoints 3
Create a checkpoint, current number of checkpoints 4
Create a checkpoint, current number of checkpoints 5
Delete a checkpoint, current number of checkpoints 4
Delete a checkpoint, current number of checkpoints 3
{noformat}
bq. The test code doesn't really do anything when deleting a checkpoint other
than adding a count in AuditableCheckpoints, so I am wondering why this could
be flaky.
It doesn't do much, but it's dependent on things running smoothly. We're
multithreaded here with the {{emptierThread}} and can't assume that things will
be scheduled immediately, nor can we assume that they will run in some
arbitrarily small amount of time. If a machine is heavily loaded, a thread
might not get scheduled quickly or might get interrupted frequently. If the
{{emptierThread}} waits long enough or is slowed down for whatever reason, it
might take longer than 20ms to instantiate the {{AuditableTrashPolicy}} object
and delete the checkpoint. This realistically should never take this long,
which is why this is not a frequent failure. However, it is a race based on the
fact that this test requires the execution of the entire {{empiterThread}} to
take less than 120ms and that if it doesn't that the test will fail.
> TestTrash.testTrashRestarts is flaky
> ------------------------------------
>
> Key: HADOOP-14277
> URL: https://issues.apache.org/jira/browse/HADOOP-14277
> Project: Hadoop Common
> Issue Type: Bug
> Reporter: Eric Badger
>
> {noformat}
> junit.framework.AssertionFailedError: Expected num of checkpoints is 2, but
> actual is 3 expected:<2> but was:<3>
> at junit.framework.Assert.fail(Assert.java:57)
> at junit.framework.Assert.failNotEquals(Assert.java:329)
> at junit.framework.Assert.assertEquals(Assert.java:78)
> at junit.framework.Assert.assertEquals(Assert.java:234)
> at junit.framework.TestCase.assertEquals(TestCase.java:401)
> at
> org.apache.hadoop.fs.TestTrash.verifyAuditableTrashEmptier(TestTrash.java:892)
> at org.apache.hadoop.fs.TestTrash.testTrashRestarts(TestTrash.java:593)
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]