dwang-qm opened a new issue, #23451:
URL: https://github.com/apache/pulsar/issues/23451

   ### Search before asking
   
   - [X] I searched in the [issues](https://github.com/apache/pulsar/issues) 
and found nothing similar.
   
   
   ### Read release policy
   
   - [X] I understand that unsupported versions don't get bug fixes. I will 
attempt to reproduce the issue on a supported version of Pulsar client and 
Pulsar broker.
   
   
   ### Version
   
   Broker: Pulsar 3.1.2
   Client: Pulsar v3.4.2
   
   Using Zookeeper as the metadata store.
   
   ### Minimal reproduce step
   
   1. Connect to a partitioned topic with the C++ client
   2. Update the topic to add a new partition.
   3. The C++ client will attempt to create producers for the new partitions. 
These will often fail.
   
   ### What did you expect to see?
   
   Producers successfully created.
   
   ### What did you see instead?
   
   In the broker logs, the "Illegal topic partition name" error message.
   
   ### Anything else?
   
   I believe the issue is that when the broker responds to a `PRODUCER` 
command, it calls `ServerCnx:: handleProducer`, which calls 
`BrokerService::getOrCreateTopic`, which calls `BrokerService::getTopic`, which 
calls `BrokerService::fetchPartitionedTopicMetadataAsync(TopicName topicName)`, 
which calls `BrokerService::fetchPartitionedTopicMetadataAsync(TopicName 
topicName, boolean refreshCacheAndGet)`, with `refreshCacheAndGet` set to 
false. This means that `NamespaceResources:: getPartitionedTopicMetadataAsync` 
is called with `refresh` always `false`, which means that `getAsync` is called 
on `NamespaceResources` rather than `refreshAndGetAsync`. This means that the 
`sync` call on Zookeeper is not called before performing the read.
   
   According to the Zookeeper Programmer Guide 
(https://zookeeper.apache.org/doc/r3.1.2/zookeeperProgrammers.html), "ZooKeeper 
does not guarantee that at every instance in time, two different clients will 
have identical views of ZooKeeper data. Due to factors like network delays, one 
client may perform an update before another client gets notified of the change. 
Consider the scenario of two clients, A and B. If client A sets the value of a 
znode /a from 0 to 1, then tells client B to read /a, client B may read the old 
value of 0, depending on which server it is connected to. If it is important 
that Client A and Client B read the same value, Client B should should call the 
sync() method from the ZooKeeper API method before it performs its read."
   
   This seems to indicate that without doing the sync, the broker could get an 
out of date picture of the number of partitions the topic has, resulting in 
spuriously erroring. I believe that reading the partition metadata for handling 
the `PRODUCER` command should use authoritative reads (calling `sync` before 
performing the Zookeeper reads).
   
   ### Are you willing to submit a PR?
   
   - [X] I'm willing to submit a PR!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to