[ 
https://issues.apache.org/jira/browse/BOOKKEEPER-846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14497883#comment-14497883
 ] 

Sijie Guo commented on BOOKKEEPER-846:
--------------------------------------

taking a close at LedgerChecker implementation. the BKException that [~rakeshr] 
pointed out is actually not a big deal. since the ledger checker here is just 
doing the best effects to find the bad fragment. A fragment should be treated a 
bad fragment even a ledger is deleted as the checker itself doesn't know if 
NoLedgerExists means ledger is deleted or a ledger file deleted in a bookie. 
this bad fragment will be addressed when re-replicating it, since the 
replication worker will re-open ledger to do the re-replication, which it would 
find the ledger is actually deleted.

so the exception part is totally correct. the bad thing here is purely on the 
test itself, which I think the test wasn't written in correct way to test 
ledger not exists case.

The flakiness here is on handling the last ensemble in an non-closed ledger. 
Garbage collection could kick in between checking if that ensemble having 
entries and the actual checking fragment. If gc kick  in before/after checking 
last ensemble, it would always pass. but if gc kicks in between, it would 
always fail.

so we should fix the test itself to make it test what it should test. we 
shouldn't fix the ledger checker part.

{code}

            // Check for the case that no last confirmed entry has
            // been set.
            if (curEntryId == lastEntry) {
                final long entryToRead = curEntryId;

                EntryExistsCallback eecb
                    = new 
EntryExistsCallback(lh.getLedgerMetadata().getWriteQuorumSize(),
                                              new GenericCallback<Boolean>() {
                                                  public void 
operationComplete(int rc, Boolean result) {
                                                      if (result) {
                                                          
fragments.addAll(finalSegmentFragments);
                                                      }
                                                      checkFragments(fragments, 
cb);
                                                  }
                                              });

                for (int bi : 
lh.getDistributionSchedule().getWriteSet(entryToRead)) {
                    BookieSocketAddress addr = curEnsemble.get(bi);
                    bookieClient.readEntry(addr, lh.getId(),
                                           entryToRead, eecb, null);
                }
                return;
            } else {
                fragments.addAll(finalSegmentFragments);
            }
        }

        checkFragments(fragments, cb);
{code}

> TestLedgerChecker times out
> ---------------------------
>
>                 Key: BOOKKEEPER-846
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-846
>             Project: Bookkeeper
>          Issue Type: Test
>            Reporter: Flavio Junqueira
>            Assignee: Rakesh R
>            Priority: Blocker
>             Fix For: 4.4.0, 4.3.1
>
>         Attachments: BOOKKEEPER-846-001.patch, 
> org.apache.bookkeeper.client.TestLedgerChecker-output.txt
>
>
> {noformat}
> java.lang.Exception: test timed out after 3000 milliseconds
>         at java.lang.Object.wait(Native Method)
>         at java.lang.Object.wait(Object.java:502)
>         at org.apache.bookkeeper.client.SyncCounter.block(SyncCounter.java:51)
>         at 
> org.apache.bookkeeper.client.LedgerHandle.addEntry(LedgerHandle.java:480)
>         at 
> org.apache.bookkeeper.client.LedgerHandle.addEntry(LedgerHandle.java:457)
>         at 
> org.apache.bookkeeper.client.TestLedgerChecker.testShouldGetTwoFrgamentsIfTwoBookiesFailedInSameEnsemble(TestLedgerChecker.java:185)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:497)
>         at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
>         at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
>         at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
>         at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
>         at 
> org.junit.internal.runners.statements.FailOnTimeout$1.run(FailOnTimeout.java:28)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to