BewareMyPower commented on PR #15413:
URL: https://github.com/apache/pulsar/pull/15413#issuecomment-1119584414
> ```java
>
messagesToSend.add(producer.newMessage().value("msg-0").key("A").sequenceId(0));
>
messagesToSend.add(producer.newMessage().value("msg-1").key("B").sequenceId(1));
>
messagesToSend.add(producer.newMessage().value("msg-2").key("B").sequenceId(2));
>
messagesToSend.add(producer.newMessage().value("msg-3").key("A").sequenceId(3));
> ```
>
> What is the expected behavior for the above case?
>
> The deduplication depends on the monotonically increasing sequence ID, but
the key-based batcher will break the rule.
| | Before this patch | After this patch |
| -------------------------- | ------------------ | ------------------ |
| message order | A-0, A-3, B-1, B-2 | B-1, B-2, A-0, A-3 |
| highest sequence id pushed | 1 | 3 |
I've explained in details in the previous comment. It's true that even after
this patch, message deduplication cannot achieve effectively-once, instead,
it's at-most once. But it can avoid the **duplicated messages**. It doesn't
solve the problem fundamentally, but it make it better.
I noted that when #4435 added the key based batch container, it also sorted
the batches by sequence id.
https://github.com/apache/pulsar/blob/a1fb200ff707e9855efb563a27a894664a59c58b/pulsar-client/src/main/java/org/apache/pulsar/client/impl/BatchMessageKeyBasedContainer.java#L154-L156
At that time, #5491 was not pushed and there is no highest sequence id field
in `MessageMetadata`. In #5491, it looks like the key based batch container was
forgotten so that there were no changes.
Even without considering the enhancement (maybe meaningless at this time) to
the deduplication, the refactor of this PR that reuses the
`BatchMessageContainerImpl` rather than the `KeyedBatch` could make the code
easier to maintain. For example, when `currentTxnidMostBits` and
`currentTxnidLeastBits` were added in #8415, these two fields must be added to
both default and key based containers.
BTW, maybe in future, we can support deduplication by keys and `Key_Shared`
mode can also support effectively-once as well.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]