According to Ivan’s reply, I did a check of the build history. Seems
recently failing is with this stack:
java.io.IOException: Unable to delete directory
/tmp/bkTest3561939033223584760.dir/current/0.
at
org.apache.commons.io.FileUtils.deleteDirectory(FileUtils.java:1337)
at
org.apache.commons.io.FileUtils.forceDelete(FileUtils.java:1910)
at
org.apache.commons.io.FileUtils.cleanDirectory(FileUtils.java:1399)
at
org.apache.commons.io.FileUtils.deleteDirectory(FileUtils.java:1331)
at
org.apache.commons.io.FileUtils.forceDelete(FileUtils.java:1910)
at
org.apache.commons.io.FileUtils.cleanDirectory(FileUtils.java:1399)
at
org.apache.commons.io.FileUtils.deleteDirectory(FileUtils.java:1331)
at
org.apache.bookkeeper.test.BookKeeperClusterTestCase.cleanupTempDirs(BookKeeperClusterTestCase.java:186)
at
org.apache.bookkeeper.test.BookKeeperClusterTestCase.tearDown(BookKeeperClusterTestCase.java:114)
This may be caused by an error in ForceWriteThread::run(), which leaked
“logFile.close()” when interrupt comes. And I have opened a ticket in JIRA.
private class ForceWriteThread {
public void run() {
LOG.info("ForceWrite Thread started");
boolean shouldForceWrite = true;
int numReqInLastForceWrite = 0;
while(running) {
ForceWriteRequest req = null;
try {
…
} catch (IOException ioe) {
LOG.error("I/O exception in ForceWrite thread", ioe);
running = false;
} catch (InterruptedException e) {
LOG.error("ForceWrite thread interrupted", e);
if (null != req) {
req.closeFileIfNecessary(); < ==== 2, when
interrupt, “shouldClose” not set properly, so file may not close
}
running = false;
}
}
// Regardless of what caused us to exit, we should notify the
// the parent thread as it should either exit or be in the
process
// of exiting else we will have write requests hang
threadToNotifyOnEx.interrupt();
}
// shutdown sync thread
void shutdown() throws InterruptedException {
running = false;
this.interrupt(); < ==== 1, call interrupt
this.join();
}
}
public void closeFileIfNecessary() {
// Close if shouldClose is set
if (shouldClose) { < ==== 3, “shouldClose” is false
here.
// We should guard against exceptions so its
// safe to call in catch blocks
try {
logFile.close();
// Call close only once
shouldClose = false;
}
catch (IOException ioe) {
LOG.error("I/O exception while closing file", ioe);
}
}
}
Thanks.
-Jia
On Sat, Feb 21, 2015 at 3:07 AM, Ivan Kelly <[email protected]> wrote:
> there does seem to be some flakiness in master in general. Jenkins is
> failing every couple of builds.
>
> https://builds.apache.org/job/bookkeeper-master/
>
> On Fri, Feb 20, 2015 at 7:23 PM, Sijie Guo <[email protected]> wrote:
> > I didn't encounter this. Does it work if you run master? just to isolate
> if
> > it is the branch-only issue.
> >
> > - Sijie
> >
> > On Thu, Feb 19, 2015 at 1:56 PM, Flavio Junqueira <
> > [email protected]> wrote:
> >
> >> Right now I can't get this test to pass in any of my settings, is it
> known
> >> to be flaky?
> >>
> >> Tests in error:
> >>
> >>
> testPeriodicCheckWhenLedgerDeleted(org.apache.bookkeeper.replication.AuditorPeriodicCheckTest):
> >> test timed out after 60000 milliseconds
> >>
> >> -Flavio
> >>
> >> > On 18 Feb 2015, at 22:55, Sijie Guo <[email protected]> wrote:
> >> >
> >> > How about the master? Are u able to get a clean build on it?
> >> >
> >> > On Wed, Feb 18, 2015 at 7:12 AM, Flavio Junqueira <
> >> > [email protected]> wrote:
> >> >
> >> >> The disk isn't getting full while running the tests, I checked
> multiple
> >> >> times. I'm having a hard time to get a clean build with this
> computer,
> >> it
> >> >> sounds like there are some flaky tests.
> >> >>
> >> >> -Flavio
> >> >>
> >> >>> On 17 Feb 2015, at 19:22, Sijie Guo <[email protected]> wrote:
> >> >>>
> >> >>> Hi Flavio:
> >> >>>
> >> >>> What is your disk space usage when you run the tests?
> >> >>>
> >> >>> - Sijie
> >> >>>
> >> >>> On Sat, Feb 14, 2015 at 8:02 AM, Flavio Junqueira <
> >> >>> [email protected]> wrote:
> >> >>>
> >> >>>> I'm getting a lot of test errors, am I the only one to observe
> this?
> >> >>>>
> >> >>>> Results :
> >> >>>>
> >> >>>> Failed tests:
> >> >>>> testCloseDuringOp[0](org.apache.bookkeeper.client.BookKeeperTest):
> >> Close
> >> >>>> never completed
> >> >>>> testShutdown(org.apache.bookkeeper.replication.AuditorBookieTest):
> >> >>>> Auditor re-election is not happened for auditor failure! expected
> not
> >> >> same
> >> >>>>
> >> >>>>
> >> >>
> >>
> testIndexCorruption(org.apache.bookkeeper.replication.AuditorPeriodicCheckTest):
> >> >>>> Ledger should be under replicated expected:<4> but was:<-1>
> >> >>>>
> >> >>>>
> >> >>
> >>
> testPeriodicCheckWhenDisabled(org.apache.bookkeeper.replication.AuditorPeriodicCheckTest):
> >> >>>> All should be underreplicated
> >> >>>>
> testShutdown(org.apache.bookkeeper.replication.AutoRecoveryMainTest):
> >> >>>> AuditorElector should not be running
> >> >>>>
> >> >>>> Tests in error:
> >> >>>>
> >> >>>>
> >> >>
> >>
> testBookieRestartContinuously(org.apache.bookkeeper.bookie.BookieShutdownTest):
> >> >>>> test timed out after 150000 milliseconds
> >> >>>> testCloseDuringOp[1](org.apache.bookkeeper.client.BookKeeperTest):
> >> test
> >> >>>> timed out after 60000 milliseconds
> >> >>>>
> >> >>>>
> >> >>
> >>
> testShouldNotGetTheFragmentIfThereIsNoMissedEntry(org.apache.bookkeeper.client.TestLedgerChecker):
> >> >>>> test timed out after 3000 milliseconds
> >> >>>>
> >> >>>>
> >> >>
> >>
> testShouldGetTwoFrgamentsIfTwoBookiesFailedInSameEnsemble(org.apache.bookkeeper.client.TestLedgerChecker):
> >> >>>> test timed out after 3000 milliseconds
> >> >>>>
> >> >>>>
> >> >>
> >>
> testShouldNotGetAnyFragmentIfNoLedgerPresent(org.apache.bookkeeper.client.TestLedgerChecker):
> >> >>>> test timed out after 3000 milliseconds
> >> >>>>
> >> >>>>
> >> >>
> >>
> testShouldGetFailedEnsembleNumberOfFgmntsIfEnsembleBookiesFailedOnNextWrite(org.apache.bookkeeper.client.TestLedgerChecker):
> >> >>>> test timed out after 3000 milliseconds
> >> >>>>
> >> >>>>
> >> >>
> >>
> testShouldGetOneFragmentWithSingleEntryOpenedLedger(org.apache.bookkeeper.client.TestLedgerChecker):
> >> >>>> test timed out after 3000 milliseconds
> >> >>>>
> >> >>>>
> >> >>
> >>
> testSingleEntryAfterEnsembleChange(org.apache.bookkeeper.client.TestLedgerChecker):
> >> >>>> test timed out after 3000 milliseconds
> >> >>>>
> >> >>>>
> >> >>
> >>
> testClosedSingleEntryLedger(org.apache.bookkeeper.client.TestLedgerChecker):
> >> >>>> test timed out after 3000 milliseconds
> >> >>>>
> >> >>>>
> >> >>
> >>
> testPeriodicCheckWhenLedgerDeleted(org.apache.bookkeeper.replication.AuditorPeriodicCheckTest):
> >> >>>> test timed out after 60000 milliseconds
> >> >>>>
> >> >>>>
> >> >>
> >>
> testRWShouldCleanTheLedgerFromUnderReplicationIfLedgerAlreadyDeleted[0](org.apache.bookkeeper.replication.TestReplicationWorker):
> >> >>>> test timed out after 3000 milliseconds
> >> >>>>
> >> >>>>
> >> >>
> >>
> testRWShouldCleanTheLedgerFromUnderReplicationIfLedgerAlreadyDeleted[1](org.apache.bookkeeper.replication.TestReplicationWorker):
> >> >>>> test timed out after 3000 milliseconds
> >> >>>>
> >> >>>>
> >> >>
> >>
> testRWShouldCleanTheLedgerFromUnderReplicationIfLedgerAlreadyDeleted[2](org.apache.bookkeeper.replication.TestReplicationWorker):
> >> >>>> test timed out after 3000 milliseconds
> >> >>>> testCompat400(org.apache.bookkeeper.test.TestBackwardCompat): test
> >> >> timed
> >> >>>> out after 60000 milliseconds
> >> >>>> testCompat410(org.apache.bookkeeper.test.TestBackwardCompat): test
> >> >> timed
> >> >>>> out after 60000 milliseconds
> >> >>>>
> >> >>>>> On 13 Feb 2015, at 09:23, Sijie Guo <[email protected]> wrote:
> >> >>>>>
> >> >>>>> This is the first release candidate for Apache BookKeeper, version
> >> >> 4.3.1.
> >> >>>>> It fixes the following issues:
> >> >>>>>
> >> >>>>
> >> >>
> >>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12328755&styleName=Html&projectId=12311293
> >> >>>>>
> >> >>>>> *** Please download, test and vote by Feb 17th 2015, 10:00 GMT.
> >> >>>>>
> >> >>>>> Note that we are voting upon the source (tag), binaries are
> provided
> >> >> for
> >> >>>>> convenience.
> >> >>>>>
> >> >>>>> Source and binary files:
> >> >>>>>
> >> >>>>
> >> >>
> >>
> https://dist.apache.org/repos/dist/dev/bookkeeper/bookkeeper-4.3.1-candidate-0/
> >> >>>>>
> >> >>>>> Maven staging repo:
> >> >>>>>
> >> >>>>
> >> >>
> >>
> https://repository.apache.org/content/repositories/orgapachebookkeeper-1005/
> >> >>>>>
> >> >>>>> The tag to be voted upon:
> >> >>>>> release-4.3.1 (b830f4e88c991d67a84ed883c6136989a54c2556)
> >> >>>>>
> >> >>>>> BookKeeper's KEYS file containing PGP keys we use to sign the
> >> release:
> >> >>>>> https://dist.apache.org/repos/dist/release/bookkeeper/KEYS
> >> >>>>>
> >> >>>>> Please download the the source package, and follow the README to
> >> build
> >> >>>>> and run a bookkeeper and hedwig service.
> >> >>>>
> >> >>>>
> >> >>
> >> >>
> >>
> >>
>