sijie opened a new issue #5476: Message deduplication is not well handled when 
batching is enabled with external provided sequenceId
URL: https://github.com/apache/pulsar/issues/5476
 
 
   **Describe the bug**
   
   Current implementation of Pulsar producer doesn't check the sequenceId when 
adding messages to a batch container. That results in violations to idempotent 
producing with external sequenceId.
   
   **To Reproduce**
   
   - provide 10 message with sequenceId from 0-9
   - provide 10 message with sequenceId from 0-9 again
   - flush the producer
   - these 20 messages will be received by the consumer
   
   **Expected behavior**
   
   The second 10 messages will not be added to container, because they are 
duplicated. We can throw exceptions to client to indicate that it adds 
out-of-order sequence ids.
   
   **Additional context**
   
   There are a couple places requires attentions regarding handling batched 
messages with external sequenceId.
   
   1) The logic to maintain `lastPublishedSequenceId` is incorrect when using 
external sequenceId : `lastSequenceIdPublished = op.sequenceId + 
op.numMessagesInBatch - 1;`. Because the last sequence id is an external 
sequence id, which can't be computed by adding the number of messages in the 
batch.
   
   2) We only maintain `lastPublishedSequenceId` (which is the acked seequence 
id). We also need to maintain a `lastPushSequenceId` to indicate the last 
sequence id that a producer sends to the broker.
   
   3) the broker need to handle the first sequence id and last sequence id in a 
message batch.
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to