[
https://issues.apache.org/jira/browse/BOOKKEEPER-846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14497883#comment-14497883
]
Sijie Guo commented on BOOKKEEPER-846:
--------------------------------------
taking a close at LedgerChecker implementation. the BKException that [~rakeshr]
pointed out is actually not a big deal. since the ledger checker here is just
doing the best effects to find the bad fragment. A fragment should be treated a
bad fragment even a ledger is deleted as the checker itself doesn't know if
NoLedgerExists means ledger is deleted or a ledger file deleted in a bookie.
this bad fragment will be addressed when re-replicating it, since the
replication worker will re-open ledger to do the re-replication, which it would
find the ledger is actually deleted.
so the exception part is totally correct. the bad thing here is purely on the
test itself, which I think the test wasn't written in correct way to test
ledger not exists case.
The flakiness here is on handling the last ensemble in an non-closed ledger.
Garbage collection could kick in between checking if that ensemble having
entries and the actual checking fragment. If gc kick in before/after checking
last ensemble, it would always pass. but if gc kicks in between, it would
always fail.
so we should fix the test itself to make it test what it should test. we
shouldn't fix the ledger checker part.
{code}
// Check for the case that no last confirmed entry has
// been set.
if (curEntryId == lastEntry) {
final long entryToRead = curEntryId;
EntryExistsCallback eecb
= new
EntryExistsCallback(lh.getLedgerMetadata().getWriteQuorumSize(),
new GenericCallback<Boolean>() {
public void
operationComplete(int rc, Boolean result) {
if (result) {
fragments.addAll(finalSegmentFragments);
}
checkFragments(fragments,
cb);
}
});
for (int bi :
lh.getDistributionSchedule().getWriteSet(entryToRead)) {
BookieSocketAddress addr = curEnsemble.get(bi);
bookieClient.readEntry(addr, lh.getId(),
entryToRead, eecb, null);
}
return;
} else {
fragments.addAll(finalSegmentFragments);
}
}
checkFragments(fragments, cb);
{code}
> TestLedgerChecker times out
> ---------------------------
>
> Key: BOOKKEEPER-846
> URL: https://issues.apache.org/jira/browse/BOOKKEEPER-846
> Project: Bookkeeper
> Issue Type: Test
> Reporter: Flavio Junqueira
> Assignee: Rakesh R
> Priority: Blocker
> Fix For: 4.4.0, 4.3.1
>
> Attachments: BOOKKEEPER-846-001.patch,
> org.apache.bookkeeper.client.TestLedgerChecker-output.txt
>
>
> {noformat}
> java.lang.Exception: test timed out after 3000 milliseconds
> at java.lang.Object.wait(Native Method)
> at java.lang.Object.wait(Object.java:502)
> at org.apache.bookkeeper.client.SyncCounter.block(SyncCounter.java:51)
> at
> org.apache.bookkeeper.client.LedgerHandle.addEntry(LedgerHandle.java:480)
> at
> org.apache.bookkeeper.client.LedgerHandle.addEntry(LedgerHandle.java:457)
> at
> org.apache.bookkeeper.client.TestLedgerChecker.testShouldGetTwoFrgamentsIfTwoBookiesFailedInSameEnsemble(TestLedgerChecker.java:185)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:497)
> at
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
> at
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
> at
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
> at
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
> at
> org.junit.internal.runners.statements.FailOnTimeout$1.run(FailOnTimeout.java:28)
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)