milos-matijasevic opened a new issue #8615:
URL: https://github.com/apache/pulsar/issues/8615
**Describe the bug**
Consumers are stuck and logs from the broker are:
```
ERROR
org.apache.bookkeeper.mledger.offload.jcloud.impl.BlobStoreManagedLedgerOffloader
- Failed readOffloaded:
java.lang.NullPointerException: null
at
org.apache.bookkeeper.mledger.offload.jcloud.impl.BlobStoreManagedLedgerOffloader.lambda$new$1(BlobStoreManagedLedgerOffloader.java:153)
~[?:?]
at
org.apache.bookkeeper.mledger.offload.jcloud.impl.BlobStoreBackedReadHandleImpl.open(BlobStoreBackedReadHandleImpl.java:196)
~[?:?]
at
org.apache.bookkeeper.mledger.offload.jcloud.impl.BlobStoreManagedLedgerOffloader.lambda$readOffloaded$5(BlobStoreManagedLedgerOffloader.java:556)
~[?:?]
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
[?:1.8.0_252]
at
com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
[com.google.guava-guava-25.1-jre.jar:?]
at
com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:57)
[com.google.guava-guava-25.1-jre.jar:?]
at
com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
[com.google.guava-guava-25.1-jre.jar:?]
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
[?:1.8.0_252]
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
[?:1.8.0_252]
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
[?:1.8.0_252]
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
[?:1.8.0_252]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[?:1.8.0_252]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[?:1.8.0_252]
at
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
[io.netty-netty-common-4.1.48.Final.jar:4.1.48.Final]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_252]
11:42:55.978 [bookkeeper-ml-workers-OrderedExecutor-3-0] ERROR
org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl -
[public/default/persistent/xxx] Error opening ledger for reading at position
45043:0 - org.apache.bookkeeper.mledger.ManagedLedgerException: Unknown
exception
```
This is maybe related to the error we got after cluster auto-upgraded to
v2.6.2 and we got this error
(https://gist.github.com/milos-matijasevic/1502d90293bb89ce4fb16b4c61bb81a0),
and also zookeeper went to a state where they don't see each other and brokers
were crashing so we downgraded to 2.6.1 (which we used before). (This should
probably be a separate issue)
When i try to search for this bucket in s3 with
```bash
aws s3api list-objects --bucket bucketname --query "Contents[?contains(Key,
'ledger-45043')]"
```
it's not there, and stats-internal for this topic returns like everything is
ok with that ledger:
```
{
"ledgerId" : 45043,
"entries" : 53640,
"size" : 26299313,
"offloaded" : true
}
```
We found one more ledger in a different topic with the same problem.
We made our consumers run with skipping messages for that ledger and
continue reading from a next ledger(next ledger is there), but in every next
reading the topic where start position is before this ledger will be a problem,
is there a way to fix this, or a way to maybe at least delete this corrupted
ledgers?
While i am writing this, the same thing happened to some new ledger.
Nothing is strange for these ledgers, size and entries are the same as for
other ledgers.
**Desktop (please complete the following information):**
- OS: Linux
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]