poorbarcode commented on code in PR #15413:
URL: https://github.com/apache/pulsar/pull/15413#discussion_r863205618
##########
pulsar-client/src/main/java/org/apache/pulsar/client/impl/BatchMessageKeyBasedContainer.java:
##########
@@ -115,66 +82,46 @@ public boolean isMultiBatches() {
return true;
}
- private ProducerImpl.OpSendMsg createOpSendMsg(KeyedBatch keyedBatch)
throws IOException {
- ByteBuf encryptedPayload =
producer.encryptMessage(keyedBatch.messageMetadata,
- keyedBatch.getCompressedBatchMetadataAndPayload());
- if (encryptedPayload.readableBytes() > ClientCnx.getMaxMessageSize()) {
- keyedBatch.discard(new
PulsarClientException.InvalidMessageException(
- "Message size is bigger than " +
ClientCnx.getMaxMessageSize() + " bytes"));
- return null;
- }
-
- final int numMessagesInBatch = keyedBatch.messages.size();
- long currentBatchSizeBytes = 0;
- for (MessageImpl<?> message : keyedBatch.messages) {
- currentBatchSizeBytes += message.getDataBuffer().readableBytes();
- }
- keyedBatch.messageMetadata.setNumMessagesInBatch(numMessagesInBatch);
- if (currentTxnidMostBits != -1) {
- keyedBatch.messageMetadata.setTxnidMostBits(currentTxnidMostBits);
- }
- if (currentTxnidLeastBits != -1) {
-
keyedBatch.messageMetadata.setTxnidLeastBits(currentTxnidLeastBits);
- }
- ByteBufPair cmd = producer.sendMessage(producer.producerId,
keyedBatch.sequenceId, numMessagesInBatch,
- keyedBatch.messageMetadata, encryptedPayload);
-
- ProducerImpl.OpSendMsg op = ProducerImpl.OpSendMsg.create(
- keyedBatch.messages, cmd, keyedBatch.sequenceId,
keyedBatch.firstCallback);
-
- op.setNumMessagesInBatch(numMessagesInBatch);
- op.setBatchSizeByte(currentBatchSizeBytes);
- return op;
- }
-
@Override
public List<ProducerImpl.OpSendMsg> createOpSendMsgs() throws IOException {
- List<ProducerImpl.OpSendMsg> result = new ArrayList<>();
- List<KeyedBatch> list = new ArrayList<>(batches.values());
- list.sort(((o1, o2) -> ComparisonChain.start()
- .compare(o1.sequenceId, o2.sequenceId)
- .result()));
- for (KeyedBatch keyedBatch : list) {
- ProducerImpl.OpSendMsg op = createOpSendMsg(keyedBatch);
- if (op != null) {
- result.add(op);
+ try {
+ // In key based batching, the sequence ids might not be ordered,
for example,
+ // | key | sequence id list |
+ // | :-- | :--------------- |
+ // | A | 0, 3, 4 |
+ // | B | 1, 2 |
+ // The message order should be 1, 2, 0, 3, 4 so that a message
with a sequence id <= 4 should be dropped.
+ // However, for a MessageMetadata with both `sequence_id` and
`highest_sequence_id` fields, the broker will
+ // expect a strict order so that the batch of key "A" (0, 3, 4)
will be dropped.
+ // Therefore, we should update the `sequence_id` field to the
highest sequence id and remove the
+ // `highest_sequence_id` field to allow the weak order.
Review Comment:
> What this PR wants to solve is the **resend** case of an existing message.
For example, 4 messages might be split into two groups in key based batching:
>
> * A: 0, 3 (i.e. messages whose keys are all "A", and the sequence ids are
0 and 3)
> * B: 1, 2
> For the 1st case, we can verify it easily with the unit test.
Thank you for your explanation, I see what you mean. I still have a
question:
```java
public void test() throws Exception {
final String topic = "persistent://my-property/my-ns/test";
admin.namespaces().setDeduplicationStatus("my-property/my-ns", true);
final Producer<String> producer =
pulsarClient.newProducer(Schema.STRING)
.topic(topic)
.batcherBuilder(BatcherBuilder.KEY_BASED)
.create();
final List<TypedMessageBuilder<String>> messagesToSend = new
ArrayList<>();
messagesToSend.add(producer.newMessage().value("msg-0").key("A").sequenceId(0));
messagesToSend.add(producer.newMessage().value("msg-1").key("B").sequenceId(1));
messagesToSend.add(producer.newMessage().value("msg-3").key("A").sequenceId(3));
final List<CompletableFuture<MessageId>> futures = new ArrayList<>();
messagesToSend.forEach(msg -> futures.add(msg.sendAsync()));
producer.flush();
// didn't receive the ack receipt of batch B, it might resend batch
B.
ids.add(messagesToSend.get(1).send());
}
```
If the client didn't receive the ack receipt of batch B, it might resend
batch B.
-- | Before this patch | After this patch
-- | -- | --
msg-1 | accepted | rejected
This patch changed original definitions.
Anyway, I think it's an good catch !
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]