BewareMyPower opened a new pull request, #15413:
URL: https://github.com/apache/pulsar/pull/15413
### Motivation
Currently message deduplication doesn't work well for key based
batching. First, the key based batch container doesn't update the
`lastSequenceIdPushed`. So a batch could contain both duplicated and not
duplicated messages. Second, when `createOpSendMsgs` is called, the
`OpSendMsg` objects are sorted by the lowest sequence ids, and the
highest sequence id is not set. If a batch contains sequence id 0,1,2,
then the message with sequence id 1 or 2 won't be dropped.
### Modifications
- Refactor the key based batch container that the
`BatchMessageContainerImpl` is reused instead of maintaining a
`KeyedBatch` class.
- When `createOpSendMsgs` is called, clear the highest sequence id field
and configure the sequence id field with the highest sequence id to fix
the second issue described before.
- Add `testKeyBasedBatchingOrder` to show and verify the current
behavior.
- Add test for key based batching into
`testProducerDeduplicationWithDiscontinuousSequenceId` to verify
`lastSlastSequenceIdPushed` is updated correctly.
### Documentation
Check the box below or label this PR directly.
Need to update docs?
- [ ] `doc-required`
(Your PR needs to update docs and you will update later)
- [x] `no-need-doc`
(Please explain why)
- [ ] `doc`
(Your PR contains doc changes)
- [ ] `doc-added`
(Docs have been already added)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]