[GitHub] [pulsar] milos-matijasevic opened a new issue #8615: Failed to read ledger from s3, NPE

GitBox Wed, 18 Nov 2020 06:52:02 -0800


milos-matijasevic opened a new issue #8615:
URL: https://github.com/apache/pulsar/issues/8615



   **Describe the bug**
   Consumers are stuck and logs from the broker are:
   ```
   ERROR 
org.apache.bookkeeper.mledger.offload.jcloud.impl.BlobStoreManagedLedgerOffloader
 - Failed readOffloaded:
   java.lang.NullPointerException: null
        at 
org.apache.bookkeeper.mledger.offload.jcloud.impl.BlobStoreManagedLedgerOffloader.lambda$new$1(BlobStoreManagedLedgerOffloader.java:153)
 ~[?:?]
        at 
org.apache.bookkeeper.mledger.offload.jcloud.impl.BlobStoreBackedReadHandleImpl.open(BlobStoreBackedReadHandleImpl.java:196)
 ~[?:?]
        at 
org.apache.bookkeeper.mledger.offload.jcloud.impl.BlobStoreManagedLedgerOffloader.lambda$readOffloaded$5(BlobStoreManagedLedgerOffloader.java:556)
 ~[?:?]
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
[?:1.8.0_252]
        at 
com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
 [com.google.guava-guava-25.1-jre.jar:?]
        at 
com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:57)
 [com.google.guava-guava-25.1-jre.jar:?]
        at 
com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
 [com.google.guava-guava-25.1-jre.jar:?]
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
[?:1.8.0_252]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
[?:1.8.0_252]
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
 [?:1.8.0_252]
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
 [?:1.8.0_252]
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
[?:1.8.0_252]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
[?:1.8.0_252]
        at 
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
 [io.netty-netty-common-4.1.48.Final.jar:4.1.48.Final]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_252]
   11:42:55.978 [bookkeeper-ml-workers-OrderedExecutor-3-0] ERROR 
org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - 
[public/default/persistent/xxx] Error opening ledger for reading at position 
45043:0 - org.apache.bookkeeper.mledger.ManagedLedgerException: Unknown 
exception
   ```
   
   This is maybe related to the error we got after cluster auto-upgraded to 
v2.6.2 and we got this error 
(https://gist.github.com/milos-matijasevic/1502d90293bb89ce4fb16b4c61bb81a0), 
and also zookeeper went to a state where they don't see each other and brokers 
were crashing so we downgraded to 2.6.1 (which we used before). (This should 
probably be a separate issue)
   
   When i try to search for this bucket in s3 with
   ```bash
   aws s3api list-objects --bucket bucketname --query "Contents[?contains(Key, 
'ledger-45043')]"
   ```
   it's not there, and stats-internal for this topic returns like everything is 
ok with that ledger:
   ```
   {
       "ledgerId" : 45043,
       "entries" : 53640,
       "size" : 26299313,
       "offloaded" : true
     }
   ```
   
   We found one more ledger in a different topic with the same problem.
   
   We made our consumers run with skipping messages for that ledger and 
continue reading from a next ledger(next ledger is there), but in every next 
reading the topic where start position is before this ledger will be a problem, 
is there a way to fix this, or a way to maybe at least delete this corrupted 
ledgers?
   
   While i am writing this, the same thing happened to some new ledger.
   Nothing is strange for these ledgers, size and entries are the same as for 
other ledgers.
   
   **Desktop (please complete the following information):**
    - OS: Linux
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [pulsar] milos-matijasevic opened a new issue #8615: Failed to read ledger from s3, NPE

Reply via email to