heesung-sn commented on code in PR #20948:
URL: https://github.com/apache/pulsar/pull/20948#discussion_r1305997883


##########
pulsar-client/src/main/java/org/apache/pulsar/client/impl/ConsumerImpl.java:
##########
@@ -1449,6 +1450,24 @@ private ByteBuf processMessageChunk(ByteBuf 
compressedPayload, MessageMetadata m
         // discard message if chunk is out-of-order
         if (chunkedMsgCtx == null || chunkedMsgCtx.chunkedMsgBuffer == null
                 || msgMetadata.getChunkId() != 
(chunkedMsgCtx.lastChunkedMessageId + 1)) {
+            // Filter duplicated chunks instead of discard it. (Only do this 
when exist duplication in a chunk message)
+            // For example:
+            //     Chunk-1 sequence ID: 0, chunk ID: 0
+            //     Chunk-2 sequence ID: 0, chunk ID: 0
+            //     Chunk-3 sequence ID: 0, chunk ID: 1
+            if (chunkedMsgCtx != null && msgMetadata.getChunkId() <= 
chunkedMsgCtx.lastChunkedMessageId) {
+                log.warn("[{}] Receive a repeated chunk messageId {}, 
last-chunk-id{}, chunkId = {}",
+                        msgMetadata.getProducerName(), 
chunkedMsgCtx.lastChunkedMessageId, msgId, msgMetadata.getChunkId());
+                compressedPayload.release();
+                increaseAvailablePermits(cnx);
+                boolean repeatedlyReceived = 
Arrays.stream(chunkedMsgCtx.chunkedMessageIds)
+                        .anyMatch(messageId1 -> messageId1 != null && 
messageId1.ledgerId == messageId.getLedgerId()
+                                && messageId1.entryId == 
messageId.getEntryId());
+                if (!repeatedlyReceived) {
+                    doAcknowledge(msgId, AckType.Individual, 
Collections.emptyMap(), null);

Review Comment:
   With this code, given,  
   1: uuid=p-0, mid:1:1, chunk 1 sequence ID: 0, chunk ID: 0
   2: uuid=p-0, mid:1:2, chunk 2 sequence ID: 0, chunk ID: 1
   3: uuid=p-0, mid:1:2, chunk 2 sequence ID: 0, chunk ID: 1 // ignored
   4: uuid=p-0, mid:1:3, chunk 3 sequence ID: 0, chunk ID: 0 // ignored and 
acked
   5: uuid=p-0, mid:1:4, chunk 4 sequence ID: 0, chunk ID: 1 // ignored and 
acked
   6: uuid=p-0, mid:1:5, chunk 5 sequence ID: 0, chunk ID: 2
   
   So, msg 1, 2  and 6 will complete the chunked msg.
   
   Lets say the producer restarted the sending a chunked msg with new chunking 
scheme. I think we should find a way to differentiate the uuid when the 
producer restarts. For example,
   
   1: uuid=p-0-t1, mid:1:1, chunk 1 sequence ID: 0, chunk ID: 0
   2: uuid=p-0-t1, mid:1:2, chunk 2 sequence ID: 0, chunk ID: 1
   3: uuid=p-0-t1, mid:1:2, chunk 2 sequence ID: 0, chunk ID: 1 // ignored
   // producer restarted
   4: uuid=p-0-t1, mid:1:3, chunk 3 sequence ID: 0, chunk ID: 0 
   5: uuid=p-0-t2, mid:1:4, chunk 4 sequence ID: 0, chunk ID: 1
   6: uuid=p-0-t3, mid:1:5, chunk 5 sequence ID: 0, chunk ID: 2
   
   So, msg 4, 5  and 6 will complete the chunked msg in this case and msg 1 and 
2 will be eventually expired.
   
   This means we probably need to update the `chunking uuid` definition logic 
and add a suffix there(chunk session id). Currently,
   ```
   String uuid = totalChunks > 1 ? String.format("%s-%d", producerName, 
sequenceId) : null;
   ```
   
   
   Then, it seems like we don't need to iterate all 
chunkedMsgCtx.chunkedMessageIds.
    I think we can check
    ```
   var prevChunkMsgId = chunkedMsgCtx.chunkedMessageIds[chunkId]
   boolean repeatedlyReceived =  prevChunkMsgId.ledgerId = 
messageId.getLedgerId() 
   && prevChunkMsgId.entryId = messageId.getEntryId();
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to