poorbarcode commented on code in PR #15413:
URL: https://github.com/apache/pulsar/pull/15413#discussion_r863205618


##########
pulsar-client/src/main/java/org/apache/pulsar/client/impl/BatchMessageKeyBasedContainer.java:
##########
@@ -115,66 +82,46 @@ public boolean isMultiBatches() {
         return true;
     }
 
-    private ProducerImpl.OpSendMsg createOpSendMsg(KeyedBatch keyedBatch) 
throws IOException {
-        ByteBuf encryptedPayload = 
producer.encryptMessage(keyedBatch.messageMetadata,
-                keyedBatch.getCompressedBatchMetadataAndPayload());
-        if (encryptedPayload.readableBytes() > ClientCnx.getMaxMessageSize()) {
-            keyedBatch.discard(new 
PulsarClientException.InvalidMessageException(
-                    "Message size is bigger than " + 
ClientCnx.getMaxMessageSize() + " bytes"));
-            return null;
-        }
-
-        final int numMessagesInBatch = keyedBatch.messages.size();
-        long currentBatchSizeBytes = 0;
-        for (MessageImpl<?> message : keyedBatch.messages) {
-            currentBatchSizeBytes += message.getDataBuffer().readableBytes();
-        }
-        keyedBatch.messageMetadata.setNumMessagesInBatch(numMessagesInBatch);
-        if (currentTxnidMostBits != -1) {
-            keyedBatch.messageMetadata.setTxnidMostBits(currentTxnidMostBits);
-        }
-        if (currentTxnidLeastBits != -1) {
-            
keyedBatch.messageMetadata.setTxnidLeastBits(currentTxnidLeastBits);
-        }
-        ByteBufPair cmd = producer.sendMessage(producer.producerId, 
keyedBatch.sequenceId, numMessagesInBatch,
-                keyedBatch.messageMetadata, encryptedPayload);
-
-        ProducerImpl.OpSendMsg op = ProducerImpl.OpSendMsg.create(
-                keyedBatch.messages, cmd, keyedBatch.sequenceId, 
keyedBatch.firstCallback);
-
-        op.setNumMessagesInBatch(numMessagesInBatch);
-        op.setBatchSizeByte(currentBatchSizeBytes);
-        return op;
-    }
-
     @Override
     public List<ProducerImpl.OpSendMsg> createOpSendMsgs() throws IOException {
-        List<ProducerImpl.OpSendMsg> result = new ArrayList<>();
-        List<KeyedBatch> list = new ArrayList<>(batches.values());
-        list.sort(((o1, o2) -> ComparisonChain.start()
-                .compare(o1.sequenceId, o2.sequenceId)
-                .result()));
-        for (KeyedBatch keyedBatch : list) {
-            ProducerImpl.OpSendMsg op = createOpSendMsg(keyedBatch);
-            if (op != null) {
-                result.add(op);
+        try {
+            // In key based batching, the sequence ids might not be ordered, 
for example,
+            // | key | sequence id list |
+            // | :-- | :--------------- |
+            // | A | 0, 3, 4 |
+            // | B | 1, 2 |
+            // The message order should be 1, 2, 0, 3, 4 so that a message 
with a sequence id <= 4 should be dropped.
+            // However, for a MessageMetadata with both `sequence_id` and 
`highest_sequence_id` fields, the broker will
+            // expect a strict order so that the batch of key "A" (0, 3, 4) 
will be dropped.
+            // Therefore, we should update the `sequence_id` field to the 
highest sequence id and remove the
+            // `highest_sequence_id` field to allow the weak order.

Review Comment:
   > What this PR wants to solve is the **resend** case of an existing message. 
For example, 4 messages might be split into two groups in key based batching:
   > 
   > * A: 0, 3 (i.e. messages whose keys are all "A", and the sequence ids are 
0 and 3)
   > * B: 1, 2
   > For the 1st case, we can verify it easily with the unit test.
   
   Thank you for your explanation, I see what you mean.  I still have a 
question:
   
   ```java
   public void test() throws Exception {
           final String topic = "persistent://my-property/my-ns/test";
           admin.namespaces().setDeduplicationStatus("my-property/my-ns", true);
   
           final Producer<String> producer = 
pulsarClient.newProducer(Schema.STRING)
                   .topic(topic)
                   .batcherBuilder(BatcherBuilder.KEY_BASED)
                   .create();
           final List<TypedMessageBuilder<String>> messagesToSend = new 
ArrayList<>();
           
messagesToSend.add(producer.newMessage().value("msg-0").key("A").sequenceId(0));
           
messagesToSend.add(producer.newMessage().value("msg-1").key("B").sequenceId(1));
           
messagesToSend.add(producer.newMessage().value("msg-3").key("A").sequenceId(3));
           final List<CompletableFuture<MessageId>> futures = new ArrayList<>();
           messagesToSend.forEach(msg -> futures.add(msg.sendAsync()));
           producer.flush(); 
           // didn't receive the ack receipt of batch B,  it might resend batch 
B.
           ids.add(messagesToSend.get(1).send());
    }
   ```
   If the client didn't receive the ack receipt of batch B,  it might resend 
batch B.
   
   -- | Before this patch | After this patch
   -- | -- | --
   msg-1 | accepted | rejected
   
   This patch changed original definitions.  
   
   Anyway, I think it's an good catch !



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to