hamadodene commented on issue #2528:
URL: https://github.com/apache/bookkeeper/issues/2528#issuecomment-938549309


   We encountered the same problem in production. We have seen several 
Exceptions such as:
   ```
   21-09-29-02-11-13       Failed to compact entry log 8277 due to unexpected 
error
   21-09-29-02-11-13       java.lang.IllegalArgumentException: Negative position
   java.lang.IllegalArgumentException: Negative position
           at 
java.base/sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:785)
           at 
org.apache.bookkeeper.bookie.BufferedReadChannel.read(BufferedReadChannel.java:93)
           at 
org.apache.bookkeeper.bookie.BufferedReadChannel.read(BufferedReadChannel.java:65)
           at 
org.apache.bookkeeper.bookie.EntryLogger.readFromLogChannel(EntryLogger.java:418)
           at 
org.apache.bookkeeper.bookie.EntryLogger.scanEntryLog(EntryLogger.java:996)
           at 
org.apache.bookkeeper.bookie.EntryLogCompactor.compact(EntryLogCompactor.java:61)
           at 
org.apache.bookkeeper.bookie.GarbageCollectorThread.compactEntryLog(GarbageCollectorThread.java:518)
           at 
org.apache.bookkeeper.bookie.GarbageCollectorThread.doCompactEntryLogs(GarbageCollectorThread.java:455)
           at 
org.apache.bookkeeper.bookie.GarbageCollectorThread.runWithFlags(GarbageCollectorThread.java:360)
           at 
org.apache.bookkeeper.bookie.GarbageCollectorThread.safeRun(GarbageCollectorThread.java:309)
           at 
org.apache.bookkeeper.common.util.SafeRunnable.run(SafeRunnable.java:36)
           at 
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
           at 
java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
           at 
java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
           at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
           at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
           at 
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
           at java.base/java.lang.Thread.run(Thread.java:834)
   
   ```
   After several days we have The Herddb service which bases its replication on 
bookkeeper had to recover its tables.
   This recovery failed for Exception:
   
   ```
   21-10-07-16-24-17 herddb.core.DBManager Oct 07, 2021 4:24:17 PM 
herddb.core.DBManager manageTableSpaces
   SEVERE: cannot handle tablespace q103
   herddb.log.LogNotAvailableException: 
org.apache.bookkeeper.client.BKException$BKDigestMatchException: Entry digest 
does not match
   at herddb.cluster.BookkeeperCommitLog.recovery(BookkeeperCommitLog.java:683)
   at herddb.core.TableSpaceManager.recover(TableSpaceManager.java:325)
   at herddb.core.TableSpaceManager.start(TableSpaceManager.java:250)
   at herddb.core.DBManager.handleTableSpace(DBManager.java:571)
   at herddb.core.DBManager.manageTableSpaces(DBManager.java:1226)
   at herddb.core.DBManager.executeActivator(DBManager.java:1172)
   at herddb.core.DBManager.access$500(DBManager.java:120)
   at herddb.core.DBManager$Activator.run(DBManager.java:1115)
   at java.base/java.lang.Thread.run(Thread.java:834)
   Caused by: org.apache.bookkeeper.client.BKException$BKDigestMatchException: 
Entry digest does not match
   at org.apache.bookkeeper.client.BKException.create(BKException.java:70)
   at 
org.apache.bookkeeper.client.PendingReadOp.submitCallback(PendingReadOp.java:640)
   at 
org.apache.bookkeeper.client.PendingReadOp$LedgerEntryRequest.fail(PendingReadOp.java:171)
   at 
org.apache.bookkeeper.client.PendingReadOp$SequenceReadRequest.sendNextRead(PendingReadOp.java:393)
   at 
org.apache.bookkeeper.client.PendingReadOp$SequenceReadRequest.logErrorAndReattemptRead(PendingReadOp.java:436)
   at 
org.apache.bookkeeper.client.PendingReadOp$LedgerEntryRequest.complete(PendingReadOp.java:142)
   at 
org.apache.bookkeeper.client.PendingReadOp$SequenceReadRequest.complete(PendingReadOp.java:442)
   at 
org.apache.bookkeeper.client.PendingReadOp.readEntryComplete(PendingReadOp.java:590)
   at 
org.apache.bookkeeper.proto.PerChannelBookieClient$ReadCompletion$1.readEntryComplete(PerChannelBookieClient.java:1836)
   at 
org.apache.bookkeeper.proto.PerChannelBookieClient$ReadCompletion.handleReadResponse(PerChannelBookieClient.java:1917)
   at 
org.apache.bookkeeper.proto.PerChannelBookieClient$ReadCompletion.handleV3Response(PerChannelBookieClient.java:1892)
   at 
org.apache.bookkeeper.proto.PerChannelBookieClient$3.safeRun(PerChannelBookieClient.java:1447)
   at org.apache.bookkeeper.common.util.SafeRunnable.run(SafeRunnable.java:36)
   at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
   at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
   at 
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
   ... 1 more
   ```
   
   We think this is due to corruption of the entry log.
   
   Do you have any idea how we can solve this problem?
   
   cc @eolivelli  
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to