pkumar-singh opened a new issue, #18510:
URL: https://github.com/apache/pulsar/issues/18510
### Motivation
When a topic is a partitioned topic and a partition is not available for
producing messages, currently pulsar client will still try to produce messages
on unavailable partitions, which it may not necessarily need to do in certain
cases. Pulsar Client may simply pick up another partition and try producing in
certain cases.
Partition Unavailable
There could be a plethora of reasons a partition can become unavailable. But
the most prominent reason is partition is moving from one broker to another,
and until every actor is in sync with which broker owns the partition, the
partition will be unavailable for producing. Actors are producers, old broker,
new broker.
### Goal
Produce uninterrupted as long as possible when a partition is down.
### API Changes
pulsar-client-api/src/main/java/org/apache/pulsar/client/api/ProducerBuilder.java
/**
* This config will ensure that If possible PartitionedProducer would
attempt to produce message on
* another available partitions, If currently picked partition is not
available for some reason.
* Next available partition will be chosen by the same routing policy as
client is configured with.
* @param maxRetryOtherPartition
* How many partitions should be tried before bailing out
* @return the producer builder instance
*/
ProducerBuilder<T> maxRetryOtherPartitions(int maxRetryOtherPartition);
### Implementation
**Client Behavior**
This is the typical produce code.
producer.sendAsync(payLoad.getBytes(StandardCharsets.UTF_8));
When send is called message is enqueued in a queue(called pending message
queue) and the future is returned.
And future is only completed when the message is picked from the queue and
sent to the broker asynchronously and ack is received asynchronously again. Max
size of the pending message queue is controlled by producer config
maxPendingMessages.
When pending message queue is full, the application will start getting
publish failures. Pending message queue provide a cushion towards unavailable
partitions. But again it has some limits.
**When another partitions can be picked**
When the message is not keyed. That means the message is not ordered based
on a key.
When routing mode is round-robin, that means a message can be produced to
any of the partitions. So If a partition is unavailable try and pick up another
partition for producing, by using the same round-robin algorithm.
### Alternatives
_No response_
### Anything else?
My suggestion is to keep Router(RoundRobin) not dependent on whether a
partition is available or not. Or batching is enabled or publish is happening
under transaction.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]