Vanlightly opened a new pull request #11570:
URL: https://github.com/apache/pulsar/pull/11570


   Fixes #11496 also matches part of PIP 79.
   
   C++ implementation that closely matches the proposed Java client changes 
from reducing partitioned producer connections and lookups: PR 10279
   
   ### Motivation
   
   Producers that send messages to partitioned topics start a producer per 
partition, even when using single partition routing. For topics that have the 
combination of a large number of producers and a large number of partitions, 
this can put strain on the brokers. With say 1000 partitions and single 
partition routing with non-keyed messages, 999 topic owner lookups and producer 
registrations are performed that could be avoided.
   
   PIP 79 also describes this. I wrote this before realising that PIP 79 also 
covers this. This implementation can be reviewed and contrasted to the Java 
client implementation in https://github.com/apache/pulsar/pull/10279.
   
   ### Modifications
   
   Allows partitioned producers to start producers for individual partitions 
lazily. Starting a producer involves a topic owner
   lookup to find out which broker is the owner of the partition, then 
registering the producer for that partition with the owner
   broker. For topics with many partitions and when using SinglePartition 
routing without keyed messages, all of these
   lookups and producer registrations are a waste except for the single chosen 
partition.
   
   This change allows the user to control whether a producer on a partitioned 
topic uses this lazy start or not, via a new config
   in ProducerConfiguration. When 
ProducerConfiguration.setLazyStartPartitionedProducers(true) is set, the 
PartitionedProducerImpl.start() becomes a synchronous operation that only does 
housekeeping (no network operations).
   The producer of any given partition is started (which includes a topic owner 
lookup and registration) upon sending the first message to that partition. 
While the producer starts, messages are buffered.
   
   The sendTimeout timer is only activated once a producer has been fully 
started, which should give enough time for any buffered messages to be sent. 
For very short send timeouts, this setting could cause send timeouts during the 
start phase. The default of 30s should however not cause this issue.
   
   ### Verifying this change
   
   This change added tests and can be verified as follows:
     - BasicEndToEndTest, testPartitionedProducerConsumer
     - BasicEndToEndTest, 
testSyncFlushBatchMessagesPartitionedTopicLazyProducers
     - BasicEndToEndTest, testFlushInLazyPartitionedProducer
   
   ### Does this pull request potentially affect one of the following parts:
   
   *If `yes` was chosen, please highlight the changes*
   
     - Dependencies (does it add or upgrade a dependency): (no)
     - The public API: (yes) - client configuration
     - The schema: (no)
     - The default values of configurations: (no)
     - The wire protocol: (no)
     - The rest endpoints: (no)
     - The admin cli options: (no)
     - Anything that affects deployment: (no)
   
   ### Documentation
   
   #### For contributor
   
   For this PR, do we need to update docs?
   
   Yes, the new client config would need documenting. Can contribute that if 
this PR is accepted.
   
   
   #### For committer
   
   For this PR, do we need to update docs?
   
   - If yes,
     
     - if you update docs in this PR, label this PR with the `doc` label.
     
     - if you plan to update docs later, label this PR with the `doc-required` 
label.
   
     - if you need help on updating docs, create a follow-up issue with the 
`doc-required` label.
     
   - If no, label this PR with the `no-need-doc` label and explain why.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to