frankjkelly opened a new issue, #23572:
URL: https://github.com/apache/pulsar/issues/23572

   ### Search before asking
   
   - [X] I searched in the [issues](https://github.com/apache/pulsar/issues) 
and found nothing similar.
   
   
   ### Read release policy
   
   - [X] I understand that unsupported versions don't get bug fixes. I will 
attempt to reproduce the issue on a supported version of Pulsar client and 
Pulsar broker.
   
   
   ### Version
   
   Latest on master  i.e.4.1.0
   
   ### Minimal reproduce step
   
   1) Create two local pulsar clusters (for me it was in Minikube) with the 
same clustername (different services) 
   2) Confirm healthcheck is passing e.g. 
   ```
   kubectl port-forward service/pulsar-broker 8080:8080
   curl http://localhost:8080/admin/v2/brokers/health
   ```
   3) Enable Cluster Migration from one cluster to the other e.g. 
   ```
   kubectl port-forward service/pulsar-broker 8080:8080
   ./bin/pulsar-admin clusters update-cluster-migration blue --broker-url  
pulsar://pulsar-green-broker:6650 --service-url http://pulsar-green-broker:8080 
--migrated
   ```
   
   4) Monitor healthcheck on the original cluster
   ```
   while true; do curl http://localhost:8080/admin/v2/brokers/health; date; 
sleep 5; done
   ok
   Wed Nov  6 09:37:51 EST 2024
   ok
   Wed Nov  6 09:37:56 EST 2024
   .
   .
   Wed Nov  6 09:42:19 EST 2024
   {"reason":"\n --- An unexpected error occurred in the server ---\n\nMessage: 
request timeout {'durationMs': '30000', 'reqId':'4072026808692081594', 
'remote':'pulsar-broker-0.pulsar-broker.cogito.svc.cluster.local/10.244.0.12:6650',
 
'local':'/10.244.0.12:43376'}\n\nStacktrace:\n\norg.apache.pulsar.client.api.PulsarClientException$TimeoutException:
 request timeout {'durationMs': '30000', 'reqId':'4072026808692081594', 
'remote':'pulsar-broker-0.pulsar-broker.cogito.svc.cluster.local/10.244.0.12:6650',
 'local':'/10.244.0.12:43376'}\n\tat 
org.apache.pulsar.client.impl.ClientCnx.checkRequestTimeout(ClientCnx.java:1422)\n\tat
 
org.apache.pulsar.common.util.Runnables$CatchingAndLoggingRunnable.run(Runnables.java:54)\n\tat
 io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98)\n\tat 
io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:162)\n\tat
 
io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:173)\n\tat
 io.netty.util.concurre
 nt.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:166)\n\tat 
io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)\n\tat
 io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:408)\n\tat 
io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)\n\tat
 
io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)\n\tat 
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)\n\tat
 java.base/java.lang.Thread.run(Unknown Source)\n"}
   ```
   
   
   ### What did you expect to see?
   
   The original "blue" cluster should appear to be healthy after cluster 
migration.
   
   ### What did you see instead?
   
   "Blue" cluster reports as not healthy.
   
   
   ### Anything else?
   
   I'm not sure it even makes sense logically to migrate the healthcheck.
   Looking here it is surprising that the healthcheck topic does not register 
as an Internal topic
   
   
https://github.com/apache/pulsar/blob/c266db236df5f57bf59df1b25a23521419b2ad02/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/PersistentTopic.java#L2923-L2926
   
   
https://github.com/apache/pulsar/blob/c266db236df5f57bf59df1b25a23521419b2ad02/pulsar-broker/src/main/java/org/apache/pulsar/broker/loadbalance/extensions/ExtensibleLoadManagerImpl.java#L822-L827
   
   The healthcheck (heartbeat) topic name is defined here
   
https://github.com/apache/pulsar/blob/c266db236df5f57bf59df1b25a23521419b2ad02/pulsar-broker/src/main/java/org/apache/pulsar/broker/admin/impl/BrokersBase.java#L422-L427
   
   I might be interested in submitting a PR but would need some clarity on 
whether to change the logic in PersistentTopic.java e.g.
   ```
      public CompletableFuture<Void> checkClusterMigration() {
           if (ExtensibleLoadManagerImpl.isInternalTopic(topic)) {
               return CompletableFuture.completedFuture(null);
           }
   .
   .
   ADD LOGIC HERE
   ```
   or if it makes more sense that `ExtensibleLoadManagerImpl.isInternalTopic` 
returns true for the healthcheck topic also.
   
   ### Are you willing to submit a PR?
   
   - [ ] I'm willing to submit a PR!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to