poorbarcode commented on code in PR #21027: URL: https://github.com/apache/pulsar/pull/21027#discussion_r1303733781
########## pip/pip-295.md: ########## @@ -0,0 +1,128 @@ + +# Background knowledge + +In [PIP 37](https://github.com/apache/pulsar/wiki/PIP-37:-Large-message-size-handling-in-Pulsar), Pulsar introduced chunk messages to handle the large message. It will separate a large message into some chunks when the producer sends the significant message to the broker. On the consumer side, a consumer will wait to receive all the chunks of a message and then assemble them into a single chunk message before returning it. +In [PIP 6](https://github.com/apache/pulsar/wiki/PIP-6:-Guaranteed-Message-Deduplication), Pulsar introduced deduplication to make sure the messages sent by the producer are non-repeating. +In PIP 6, each producer will have a sequence ID that starts at 0 and increase for each message. The message with a lower sequence ID will be dropped in the broker. + +# Motivation + +In the earliest design, all the chunks in a single chunk message have the same sequence ID which causes the chunk message can not work when enabling deduplication. For example, we have a chunk message consisting of chunk-1 and chunk-2. When Broker receives chunk-1, it will update the last sequence ID to the sequence ID of chunk-1. And then, when the broker gets chunk-2, the chunk-2 will be dropped by depublication. +I opened a [PR](https://github.com/apache/pulsar/pull/20948) to resolve this case. It allowed the chunks of a single chunk message to use the same sequence ID and filter duplicated chunks in a single-chunk message on the consumer side. +It can resolve message duplication end to end, but the message duplication still exists in the topic. + +# Goals + +## In Scope +Chunk messages can be effectively filtered on the broker side. Ensure that chunk messages work normally after enabling deduplication and the topic has no duplicate chunks. Review Comment: **Background:***: There are two properties in the metadata of the cursor - `properties<String, Long>`: used to maintain the last sequence of producer-sent messages<sup>[1]</sup>. - PIP: https://github.com/apache/pulsar/wiki/PIP-6:-Guaranteed-Message-Deduplication - PR: https://github.com/apache/pulsar/pull/744 - `cursorProperties<String, String>`: used to maintain the subscription properties. - PIP: https://github.com/apache/pulsar/issues/12269 - PR: https://github.com/apache/pulsar/issues/15750 **[1]**: a structure of `properties`: ```yaml properties: - "producer_name_1" : {{last_persist_sequence_1}} - "producer_name_2" : {{last_persist_sequence_2}} ``` ---- In this PIP, you want to change `properties<String, Long>` to `properties<String, String>`, right? Could you also explain this change here? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
