dwang-qm opened a new issue, #23451: URL: https://github.com/apache/pulsar/issues/23451
### Search before asking - [X] I searched in the [issues](https://github.com/apache/pulsar/issues) and found nothing similar. ### Read release policy - [X] I understand that unsupported versions don't get bug fixes. I will attempt to reproduce the issue on a supported version of Pulsar client and Pulsar broker. ### Version Broker: Pulsar 3.1.2 Client: Pulsar v3.4.2 Using Zookeeper as the metadata store. ### Minimal reproduce step 1. Connect to a partitioned topic with the C++ client 2. Update the topic to add a new partition. 3. The C++ client will attempt to create producers for the new partitions. These will often fail. ### What did you expect to see? Producers successfully created. ### What did you see instead? In the broker logs, the "Illegal topic partition name" error message. ### Anything else? I believe the issue is that when the broker responds to a `PRODUCER` command, it calls `ServerCnx:: handleProducer`, which calls `BrokerService::getOrCreateTopic`, which calls `BrokerService::getTopic`, which calls `BrokerService::fetchPartitionedTopicMetadataAsync(TopicName topicName)`, which calls `BrokerService::fetchPartitionedTopicMetadataAsync(TopicName topicName, boolean refreshCacheAndGet)`, with `refreshCacheAndGet` set to false. This means that `NamespaceResources:: getPartitionedTopicMetadataAsync` is called with `refresh` always `false`, which means that `getAsync` is called on `NamespaceResources` rather than `refreshAndGetAsync`. This means that the `sync` call on Zookeeper is not called before performing the read. According to the Zookeeper Programmer Guide (https://zookeeper.apache.org/doc/r3.1.2/zookeeperProgrammers.html), "ZooKeeper does not guarantee that at every instance in time, two different clients will have identical views of ZooKeeper data. Due to factors like network delays, one client may perform an update before another client gets notified of the change. Consider the scenario of two clients, A and B. If client A sets the value of a znode /a from 0 to 1, then tells client B to read /a, client B may read the old value of 0, depending on which server it is connected to. If it is important that Client A and Client B read the same value, Client B should should call the sync() method from the ZooKeeper API method before it performs its read." This seems to indicate that without doing the sync, the broker could get an out of date picture of the number of partitions the topic has, resulting in spuriously erroring. I believe that reading the partition metadata for handling the `PRODUCER` command should use authoritative reads (calling `sync` before performing the Zookeeper reads). ### Are you willing to submit a PR? - [X] I'm willing to submit a PR! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
