lhotari opened a new issue #10097:
URL: https://github.com/apache/pulsar/issues/10097


   ### Problem
   
   This exception is repeated on the log when using replicated subscriptions:
   ```
   07:17:59.770 [bookkeeper-ml-workers-OrderedExecutor-4-0] ERROR 
org.apache.pulsar.broker.service.persistent.PersistentReplicator - 
[persistent://georep/default/t1][cluster-a -> cluster-b] Unexpected exception: 
Field 'replicated_from' is not set
   java.lang.IllegalStateException: Field 'replicated_from' is not set
   at 
org.apache.pulsar.common.api.proto.MessageMetadata.getReplicatedFrom(MessageMetadata.java:151)
 ~[org.apache.pulsar-pulsar-common-2.8.0-SNAPSHOT.jar:2.8.0-SNAPSHOT]
   at 
org.apache.pulsar.broker.service.persistent.PersistentReplicator.checkReplicatedSubscriptionMarker(PersistentReplicator.java:763)
 ~[org.apache.pulsar-pulsar-broker-2.8.0-SNAPSHOT.jar:2.8.0-SNAPSHOT]
   at 
org.apache.pulsar.broker.service.persistent.PersistentReplicator.readEntriesComplete(PersistentReplicator.java:366)
 ~[org.apache.pulsar-pulsar-broker-2.8.0-SNAPSHOT.jar:2.8.0-SNAPSHOT]
   at 
org.apache.bookkeeper.mledger.impl.OpReadEntry.lambda$checkReadCompletion$2(OpReadEntry.java:156)
 ~[org.apache.pulsar-managed-ledger-2.8.0-SNAPSHOT.jar:2.8.0-SNAPSHOT]
   at org.apache.bookkeeper.mledger.util.SafeRun$1.safeRun(SafeRun.java:32) 
[org.apache.pulsar-managed-ledger-2.8.0-SNAPSHOT.jar:2.8.0-SNAPSHOT]
   at org.apache.bookkeeper.common.util.SafeRunnable.run(SafeRunnable.java:36) 
[org.apache.bookkeeper-bookkeeper-common-4.13.0.jar:4.13.0]
   at 
org.apache.bookkeeper.common.util.OrderedExecutor$TimedRunnable.run(OrderedExecutor.java:203)
 [org.apache.bookkeeper-bookkeeper-common-4.13.0.jar:4.13.0]
   at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
[?:1.8.0_282]
   at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
[?:1.8.0_282]
   at 
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
 [io.netty-netty-common-4.1.60.Final.jar:4.1.60.Final]
   at java.lang.Thread.run(Thread.java:748) [?:1.8.0_282]
   ```
   
   ### To reproduce
   
   1. Create 2-cluster k8s deployment ([sample script/helm to do 
so](https://github.com/lhotari/pulsar-playground/blob/master/test-env/redeploy_multi_cluster.sh))
   2. Create consumer with replicated subscription for topic georep/default/t1 
in cluster-a and close it
   3. Create consumer with replicated subscription for topic georep/default/t1 
in cluster-b and close it
   4. Create producer for topic georep/default/t1 in cluster-a and publish 
messages to the topic
   5. Create consumer with replicated subscription for topic georep/default/t1 
in cluster-a, consume 1 message and close it
   6. Create consumer with replicated subscription for topic georep/default/t1 
in cluster-b, consume 1 message and close it
   
   It might be possible to reproduce the issue with fewer steps.
   
   ### Observations
   
   It seems that the code location broker after the switch to LightProto 
(#9046).
   The fix is easy for the exception above. The concern is the lack of test 
coverage for replicated subscriptions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to