heesung-sn commented on code in PR #20948:
URL: https://github.com/apache/pulsar/pull/20948#discussion_r1305997883


##########
pulsar-client/src/main/java/org/apache/pulsar/client/impl/ConsumerImpl.java:
##########
@@ -1449,6 +1450,24 @@ private ByteBuf processMessageChunk(ByteBuf 
compressedPayload, MessageMetadata m
         // discard message if chunk is out-of-order
         if (chunkedMsgCtx == null || chunkedMsgCtx.chunkedMsgBuffer == null
                 || msgMetadata.getChunkId() != 
(chunkedMsgCtx.lastChunkedMessageId + 1)) {
+            // Filter duplicated chunks instead of discard it. (Only do this 
when exist duplication in a chunk message)
+            // For example:
+            //     Chunk-1 sequence ID: 0, chunk ID: 0
+            //     Chunk-2 sequence ID: 0, chunk ID: 0
+            //     Chunk-3 sequence ID: 0, chunk ID: 1
+            if (chunkedMsgCtx != null && msgMetadata.getChunkId() <= 
chunkedMsgCtx.lastChunkedMessageId) {
+                log.warn("[{}] Receive a repeated chunk messageId {}, 
last-chunk-id{}, chunkId = {}",
+                        msgMetadata.getProducerName(), 
chunkedMsgCtx.lastChunkedMessageId, msgId, msgMetadata.getChunkId());
+                compressedPayload.release();
+                increaseAvailablePermits(cnx);
+                boolean repeatedlyReceived = 
Arrays.stream(chunkedMsgCtx.chunkedMessageIds)
+                        .anyMatch(messageId1 -> messageId1 != null && 
messageId1.ledgerId == messageId.getLedgerId()
+                                && messageId1.entryId == 
messageId.getEntryId());
+                if (!repeatedlyReceived) {
+                    doAcknowledge(msgId, AckType.Individual, 
Collections.emptyMap(), null);

Review Comment:
   With this code, given,  
   1: uuid=p-0, mid:1:1,  sequence ID: 0, chunk ID: 0/2
   2: uuid=p-0, mid:1:2, sequence ID: 0, chunk ID: 1/2
   3: uuid=p-0, mid:1:2, sequence ID: 0, chunk ID: 1 // ignored
   4: uuid=p-0, mid:1:3, sequence ID: 0, chunk ID: 0 // ignored and acked
   5: uuid=p-0, mid:1:4, sequence ID: 0, chunk ID: 1 // ignored and acked
   6: uuid=p-0, mid:1:5, sequence ID: 0, chunk ID: 2/2
   
   So, msg 1, 2  and 6 will complete the chunked msg.
   
   Lets say the producer restarted the sending a chunked msg with new chunking 
scheme. I think we should find a way to differentiate the uuid when the 
producer restarts. For example,
   
   1: uuid=p-0-t1, mid:1:1, sequence ID: 0, chunk ID: 0/2
   2: uuid=p-0-t1, mid:1:2, sequence ID: 0, chunk ID: 1/2
   3: uuid=p-0-t1, mid:1:2, sequence ID: 0, chunk ID: 1/2 // ignored
   // producer restarted
   4: uuid=p-0-t2, mid:1:3, sequence ID: 0, chunk ID: 0/3 
   5: uuid=p-0-t2, mid:1:4, sequence ID: 0, chunk ID: 1/3
   6: uuid=p-0-t2, mid:1:5, sequence ID: 0, chunk ID: 2/3
   7: uuid=p-0-t2, mid:1:5, sequence ID: 0, chunk ID: 3/3
   
   So, the new set of the chunked msgs: msg 4, 5, 6 and 7 will complete the 
chunked msg in this case, and msg 1 and 2 will eventually expire on the 
consumer side.
   
   This means we probably need to update the `chunking uuid` definition logic 
and add a suffix there(chunking session id). Currently,
   ```
   Currently,
   chunking uuid = producer + sequence_id
   
   Proposal
   chunking  uuid = producer + sequence_id + chunkingSessionId
   * chunkingSessionId could be a timestamp when the chunking started.
   ```
   
   
   Then, it seems like we don't need to iterate all 
chunkedMsgCtx.chunkedMessageIds.
    I think we can check
    ```
   var prevChunkMsgId = chunkedMsgCtx.chunkedMessageIds[chunkId]
   boolean repeatedlyReceived =  prevChunkMsgId.ledgerId = 
messageId.getLedgerId() 
   && prevChunkMsgId.entryId = messageId.getEntryId();
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to