lukestephenson opened a new issue #8282: URL: https://github.com/apache/pulsar/issues/8282
**Describe the bug** Topic compaction fails if data has been offloaded to S3. **To Reproduce** Steps to reproduce the behavior: 1. Create a partitioned topic with 4 partitions. I 2. Publish data to all 4 partitions. 3. On one of the partitions, trigger offloading to S3 (I ran `bin/pulsar-admin topics offload --size-threshold 2M goanna/test/topic-partition-1`). 4. Trigger compaction of the topic externally (ie the `CompactorTool`) of the partitions which have not been offloaded (these work fine). 5. Trigger compaction of the offloaded partition. It fails. The broker logs show: ``` 22:55:14.275 [BookKeeperClientWorker-OrderedExecutor-9-0] INFO org.apache.pulsar.broker.service.ServerCnx - [/192.168.1.5:50237] [persistent://goanna/test/topic-partition-1][__compaction] Reset subscription to message id 543:0 22:55:14.375 [pulsar-io-51-3] INFO org.apache.pulsar.broker.service.ServerCnx - [/192.168.1.5:50237] Subscribing on topic persistent://goanna/test/topic-partition-1 / __compaction 22:55:14.376 [pulsar-io-51-3] INFO org.apache.bookkeeper.mledger.impl.ManagedCursorImpl - [goanna/test/persistent/topic-partition-1-__compaction] Rewind from 543:0 to 543:0 22:55:14.376 [pulsar-io-51-3] INFO org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://goanna/test/topic-partition-1] There are no replicated subscriptions on the topic 22:55:14.376 [pulsar-io-51-3] INFO org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://goanna/test/topic-partition-1][__compaction] Created new subscription for 0 22:55:14.376 [pulsar-io-51-3] INFO org.apache.pulsar.broker.service.ServerCnx - [/192.168.1.5:50237] Created subscription on topic persistent://goanna/test/topic-partition-1 / __compaction 22:55:14.379 [bookkeeper-ml-workers-OrderedExecutor-2-0] WARN org.apache.bookkeeper.mledger.impl.OpReadEntry - [goanna/test/persistent/topic-partition-1][__compaction] read failed from ledger at position:543:0 : Unknown exception ``` Note the original exception is swallowed (https://github.com/apache/pulsar/blob/8933d8ddffe649e3e45458005fae5a5c6a3de47a/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerImpl.java#L3173) and the logs only show `Unknown exception`. If I modify that code above to log the exception, the underlying cause is: ``` 22:55:14.377 [bookkeeper-ml-workers-OrderedExecutor-2-0] ERROR org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - Unknown exception null java.io.EOFException: null at java.io.DataInputStream.readInt(DataInputStream.java:397) ~[?:?] at org.apache.bookkeeper.mledger.offload.jcloud.impl.BlobStoreBackedReadHandleImpl.lambda$readAsync$1(BlobStoreBackedReadHandleImpl.java:109) ~[?:?] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) ~[?:?] at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125) ~[com.google.guava-guava-25.1-jre.jar:?] at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:57) ~[com.google.guava-guava-25.1-jre.jar:?] at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78) ~[com.google.guava-guava-25.1-jre.jar:?] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) ~[?:?] at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) ~[?:?] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?] at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [io.netty-netty-common-4.1.48.Final.jar:4.1.48.Final] at java.lang.Thread.run(Thread.java:834) [?:?] ``` (I'd hoped by logging it might reveal a configuration issue which I could fix). **Expected behavior** 1. Brokers should not swallow unexpected exceptions without logging them, or including as the cause on the propagated exception. 2. Topic compaction should work for offloaded topics. **Desktop (please complete the following information):** - Initially discovered running on docker / k8s (using pulsar-helm-chart) - Later reproduced on a local standalone deployment so I could easily update the managed-ledger jar to see the underlying cause of the exception. **Additional context** Initially mentioned on slack: https://apache-pulsar.slack.com/archives/C5Z4T36F7/p1599623625299300 ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
