aweiri1 commented on issue #24819:
URL: https://github.com/apache/pulsar/issues/24819#issuecomment-3560222940

   @lhotari I am still working on resolving this issue, and I have documented 
the following findings:
   
   I believe I may have made progress on narrowing down the issue, which is 
cross communication across clusters for replicators. 
   
   As we know, geo-replication is semi-working:
   
   **issue re-cap:**
   
   - 2 pulsar kubernetes cluster set-up (okd1 and talos)
   - updated cluster urls to load balancer IPs (manually) and restarted brokers 
with success
   - can use the cluster urls successfully with producers and consumers
   - enabled geo-replication on replication tenant/ns
   - produced 100 messages on talos cluster with no consumers running on either 
cluster
   - immediately triggers connection/internal-server error in talos broker logs
   - topic never exists on okd1 cluster
   - started consumer on okd1 cluster and it receives the 100 messages
   
   **result:** geo-replication semi-working with weird topic replication issue. 
The topic never replicates, even though on consumer start, the messages 
replicate.
   
   **theory:** by default and behind the scenes for geo-replication, the talos 
replicator communicates with the okd1 broker NOT the LB/Proxy. Hence why the 
error log shows the connection error happening with the internal okd1 broker 
dns name. But we can’t do cross-cluster-communication with brokers because they 
do not expose an external endpoint, only internal. It is trying to connect over 
the internal pod network of okd, which won’t work. What we have been using is a 
proxy service in kubernetes that sits behind a loadbalancer which has an 
external IP, and this is the only external IP we are using in our kubernetes 
cluster set up. I am trying to look into pulsar configurations that would 
resolve this, but I am not sure if it is because of our kubernetes set up.
   
   configurations in broker.conf that may help resolve the issue:
   
   - advertisedAddress
         - I changed advertisedAddress to our LB IP in broker.configData in our 
helm chart, and it breaks the deployment, the pods aren't able to come up.
   - advertisedListeners
   - internalListenerName
   - bindAddress
   - createTopicToRemoteClusterForReplication
         - this is set to true for our brokers, so this shouldn't be the issue.
   
   Any feedback you may have is appreciated!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to