lhotari opened a new issue #10097: URL: https://github.com/apache/pulsar/issues/10097
### Problem This exception is repeated on the log when using replicated subscriptions: ``` 07:17:59.770 [bookkeeper-ml-workers-OrderedExecutor-4-0] ERROR org.apache.pulsar.broker.service.persistent.PersistentReplicator - [persistent://georep/default/t1][cluster-a -> cluster-b] Unexpected exception: Field 'replicated_from' is not set java.lang.IllegalStateException: Field 'replicated_from' is not set at org.apache.pulsar.common.api.proto.MessageMetadata.getReplicatedFrom(MessageMetadata.java:151) ~[org.apache.pulsar-pulsar-common-2.8.0-SNAPSHOT.jar:2.8.0-SNAPSHOT] at org.apache.pulsar.broker.service.persistent.PersistentReplicator.checkReplicatedSubscriptionMarker(PersistentReplicator.java:763) ~[org.apache.pulsar-pulsar-broker-2.8.0-SNAPSHOT.jar:2.8.0-SNAPSHOT] at org.apache.pulsar.broker.service.persistent.PersistentReplicator.readEntriesComplete(PersistentReplicator.java:366) ~[org.apache.pulsar-pulsar-broker-2.8.0-SNAPSHOT.jar:2.8.0-SNAPSHOT] at org.apache.bookkeeper.mledger.impl.OpReadEntry.lambda$checkReadCompletion$2(OpReadEntry.java:156) ~[org.apache.pulsar-managed-ledger-2.8.0-SNAPSHOT.jar:2.8.0-SNAPSHOT] at org.apache.bookkeeper.mledger.util.SafeRun$1.safeRun(SafeRun.java:32) [org.apache.pulsar-managed-ledger-2.8.0-SNAPSHOT.jar:2.8.0-SNAPSHOT] at org.apache.bookkeeper.common.util.SafeRunnable.run(SafeRunnable.java:36) [org.apache.bookkeeper-bookkeeper-common-4.13.0.jar:4.13.0] at org.apache.bookkeeper.common.util.OrderedExecutor$TimedRunnable.run(OrderedExecutor.java:203) [org.apache.bookkeeper-bookkeeper-common-4.13.0.jar:4.13.0] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_282] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_282] at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [io.netty-netty-common-4.1.60.Final.jar:4.1.60.Final] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_282] ``` ### To reproduce 1. Create 2-cluster k8s deployment ([sample script/helm to do so](https://github.com/lhotari/pulsar-playground/blob/master/test-env/redeploy_multi_cluster.sh)) 2. Create consumer with replicated subscription for topic georep/default/t1 in cluster-a and close it 3. Create consumer with replicated subscription for topic georep/default/t1 in cluster-b and close it 4. Create producer for topic georep/default/t1 in cluster-a and publish messages to the topic 5. Create consumer with replicated subscription for topic georep/default/t1 in cluster-a, consume 1 message and close it 6. Create consumer with replicated subscription for topic georep/default/t1 in cluster-b, consume 1 message and close it It might be possible to reproduce the issue with fewer steps. ### Observations It seems that the code location broker after the switch to LightProto (#9046). The fix is easy for the exception above. The concern is the lack of test coverage for replicated subscriptions. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
