lukestephenson opened a new issue #8282:
URL: https://github.com/apache/pulsar/issues/8282


   **Describe the bug**
   
   Topic compaction fails if data has been offloaded to S3.
   
   **To Reproduce**
   Steps to reproduce the behavior:
   1. Create a partitioned topic with 4 partitions.  I
   2. Publish data to all 4 partitions.  
   3. On one of the partitions, trigger offloading to S3 (I ran 
`bin/pulsar-admin topics offload --size-threshold 2M 
goanna/test/topic-partition-1`).
   4. Trigger compaction of the topic externally (ie the `CompactorTool`) of 
the partitions which have not been offloaded (these work fine).
   5. Trigger compaction of the offloaded partition.  It fails.
   
   The broker logs show:
   ```
   22:55:14.275 [BookKeeperClientWorker-OrderedExecutor-9-0] INFO  
org.apache.pulsar.broker.service.ServerCnx - [/192.168.1.5:50237] 
[persistent://goanna/test/topic-partition-1][__compaction] Reset subscription 
to message id 543:0
   22:55:14.375 [pulsar-io-51-3] INFO  
org.apache.pulsar.broker.service.ServerCnx - [/192.168.1.5:50237] Subscribing 
on topic persistent://goanna/test/topic-partition-1 / __compaction
   22:55:14.376 [pulsar-io-51-3] INFO  
org.apache.bookkeeper.mledger.impl.ManagedCursorImpl - 
[goanna/test/persistent/topic-partition-1-__compaction] Rewind from 543:0 to 
543:0
   22:55:14.376 [pulsar-io-51-3] INFO  
org.apache.pulsar.broker.service.persistent.PersistentTopic - 
[persistent://goanna/test/topic-partition-1] There are no replicated 
subscriptions on the topic
   22:55:14.376 [pulsar-io-51-3] INFO  
org.apache.pulsar.broker.service.persistent.PersistentTopic - 
[persistent://goanna/test/topic-partition-1][__compaction] Created new 
subscription for 0
   22:55:14.376 [pulsar-io-51-3] INFO  
org.apache.pulsar.broker.service.ServerCnx - [/192.168.1.5:50237] Created 
subscription on topic persistent://goanna/test/topic-partition-1 / __compaction
   22:55:14.379 [bookkeeper-ml-workers-OrderedExecutor-2-0] WARN  
org.apache.bookkeeper.mledger.impl.OpReadEntry - 
[goanna/test/persistent/topic-partition-1][__compaction] read failed from 
ledger at position:543:0 : Unknown exception
   ``` 
   
   Note the original exception is swallowed 
(https://github.com/apache/pulsar/blob/8933d8ddffe649e3e45458005fae5a5c6a3de47a/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerImpl.java#L3173)
 and the logs only show `Unknown exception`.
   
   If I modify that code above to log the exception, the underlying cause is:
   ```
   22:55:14.377 [bookkeeper-ml-workers-OrderedExecutor-2-0] ERROR 
org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - Unknown exception null
   java.io.EOFException: null
        at java.io.DataInputStream.readInt(DataInputStream.java:397) ~[?:?]
        at 
org.apache.bookkeeper.mledger.offload.jcloud.impl.BlobStoreBackedReadHandleImpl.lambda$readAsync$1(BlobStoreBackedReadHandleImpl.java:109)
 ~[?:?]
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) ~[?:?]
        at 
com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
 ~[com.google.guava-guava-25.1-jre.jar:?]
        at 
com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:57)
 ~[com.google.guava-guava-25.1-jre.jar:?]
        at 
com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
 ~[com.google.guava-guava-25.1-jre.jar:?]
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) ~[?:?]
        at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
 ~[?:?]
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) 
[?:?]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) 
[?:?]
        at 
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
 [io.netty-netty-common-4.1.48.Final.jar:4.1.48.Final]
        at java.lang.Thread.run(Thread.java:834) [?:?]
   ```
   
   (I'd hoped by logging it might reveal a configuration issue which I could 
fix).
   
   **Expected behavior**
   1. Brokers should not swallow unexpected exceptions without logging them, or 
including as the cause on the propagated exception.
   2. Topic compaction should work for offloaded topics.
   
   **Desktop (please complete the following information):**
   - Initially discovered running on docker / k8s (using pulsar-helm-chart)
   - Later reproduced on a local standalone deployment so I could easily update 
the managed-ledger jar to see the underlying cause of the exception.
   
   **Additional context**
   Initially mentioned on slack: 
https://apache-pulsar.slack.com/archives/C5Z4T36F7/p1599623625299300
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to