yapxue opened a new pull request, #17071:
URL: https://github.com/apache/pulsar/pull/17071
prefer to use index in BrokerEntryMetadata and considering batchIndex when
build sequenceId.
<!--
### Contribution Checklist
- PR title format should be *[type][component] summary*. For details, see
*[Guideline - Pulsar PR Naming
Convention](https://docs.google.com/document/d/1d8Pw6ZbWk-_pCKdOmdvx9rnhPiyuxwq60_TrD68d7BA/edit#heading=h.trs9rsex3xom)*.
- Fill out the template below to describe the changes contributed by the
pull request. That will give reviewers the context they need to do the review.
- Each pull request should address only one issue, not mix up code from
multiple issues.
- Each commit in the pull request has a meaningful commit message
- Once all items of the checklist are addressed, remove the above text and
this checklist, leaving only the filled out template below.
**(The sections below can be removed for hotfixes of typos)**
-->
Fixes #17061
### Motivation
In pulsar function EFFECTIVELY_ONCE semantic, producer attach a sequenceId
to output topic for deduplication, the sequenceId consists of ledgerId and
entryId. But for batched messages, the share the same entryId so they have same
sequenceId, others will become duplicated and dropped, this may cause data loss.
### Modifications
this change is inspired by KOP convert of MessageId to kafka offset and.
pulsar has AppendIndexMetadataInterceptor, it append index in
BrokerEntryMetadata, use this index as sequenceId is best option. If it is not
present, then considering ledgerId, entryId, batchIndex to compose sequenceId.
### Verifying this change
- [ ] Make sure that the change passes the CI checks.
*(Please pick either of the following options)*
This change added tests and can be verified as follows:
*(example:)*
- *Added integration tests for end-to-end deployment with large payloads
(10MB)*
- *Extended integration test for recovery after broker failure*
### Does this pull request potentially affect one of the following parts:
*If `yes` was chosen, please highlight the changes*
- Dependencies (does it add or upgrade a dependency): (yes / no)
- The public API: (yes / no)
- The schema: (yes / no / don't know)
- The default values of configurations: (yes / no)
- The wire protocol: (yes / no)
- The rest endpoints: (yes / no)
- The admin cli options: (yes / no)
- Anything that affects deployment: (yes / no / don't know)
### Documentation
Check the box below or label this PR directly.
Need to update docs?
- [ ] `doc-required`
(Your PR needs to update docs and you will update later)
- [ ] `doc-not-needed`
(Please explain why)
- [ ] `doc`
(Your PR contains doc changes)
- [ ] `doc-complete`
(Docs have been already added)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]