heesung-sn commented on code in PR #20948:
URL: https://github.com/apache/pulsar/pull/20948#discussion_r1305997883
##########
pulsar-client/src/main/java/org/apache/pulsar/client/impl/ConsumerImpl.java:
##########
@@ -1449,6 +1450,24 @@ private ByteBuf processMessageChunk(ByteBuf
compressedPayload, MessageMetadata m
// discard message if chunk is out-of-order
if (chunkedMsgCtx == null || chunkedMsgCtx.chunkedMsgBuffer == null
|| msgMetadata.getChunkId() !=
(chunkedMsgCtx.lastChunkedMessageId + 1)) {
+ // Filter duplicated chunks instead of discard it. (Only do this
when exist duplication in a chunk message)
+ // For example:
+ // Chunk-1 sequence ID: 0, chunk ID: 0
+ // Chunk-2 sequence ID: 0, chunk ID: 0
+ // Chunk-3 sequence ID: 0, chunk ID: 1
+ if (chunkedMsgCtx != null && msgMetadata.getChunkId() <=
chunkedMsgCtx.lastChunkedMessageId) {
+ log.warn("[{}] Receive a repeated chunk messageId {},
last-chunk-id{}, chunkId = {}",
+ msgMetadata.getProducerName(),
chunkedMsgCtx.lastChunkedMessageId, msgId, msgMetadata.getChunkId());
+ compressedPayload.release();
+ increaseAvailablePermits(cnx);
+ boolean repeatedlyReceived =
Arrays.stream(chunkedMsgCtx.chunkedMessageIds)
+ .anyMatch(messageId1 -> messageId1 != null &&
messageId1.ledgerId == messageId.getLedgerId()
+ && messageId1.entryId ==
messageId.getEntryId());
+ if (!repeatedlyReceived) {
+ doAcknowledge(msgId, AckType.Individual,
Collections.emptyMap(), null);
Review Comment:
With this code, given,
1: uuid=p-0, mid:1:1, chunk 1 sequence ID: 0, chunk ID: 0
2: uuid=p-0, mid:1:2, chunk 2 sequence ID: 0, chunk ID: 1
3: uuid=p-0, mid:1:2, chunk 2 sequence ID: 0, chunk ID: 1 // ignored
4: uuid=p-0, mid:1:3, chunk 3 sequence ID: 0, chunk ID: 0 // ignored and
acked
5: uuid=p-0, mid:1:4, chunk 4 sequence ID: 0, chunk ID: 1 // ignored and
acked
6: uuid=p-0, mid:1:5, chunk 5 sequence ID: 0, chunk ID: 2
So, msg 1, 2 and 6 will complete the chunked msg.
Lets say the producer restarted the sending a chunked msg with new chunking
scheme. I think we should find a way to differentiate the uuid when the
producer restarts. For example,
1: uuid=p-0-t1, mid:1:1, chunk 1 sequence ID: 0, chunk ID: 0
2: uuid=p-0-t1, mid:1:2, chunk 2 sequence ID: 0, chunk ID: 1
3: uuid=p-0-t1, mid:1:2, chunk 2 sequence ID: 0, chunk ID: 1 // ignored
// producer restarted
4: uuid=p-0-t1, mid:1:3, chunk 3 sequence ID: 0, chunk ID: 0
5: uuid=p-0-t2, mid:1:4, chunk 4 sequence ID: 0, chunk ID: 1
6: uuid=p-0-t3, mid:1:5, chunk 5 sequence ID: 0, chunk ID: 2
So, msg 4, 5 and 6 will complete the chunked msg in this case and msg 1 and
2 will be eventually expired.
This means we probably need to update the `chunking uuid` definition logic
and add a suffix there(chunk session id). Currently,
```
String uuid = totalChunks > 1 ? String.format("%s-%d", producerName,
sequenceId) : null;
```
Then, it seems like we don't need to iterate all
chunkedMsgCtx.chunkedMessageIds.
I think we can check
```
var prevChunkMsgId = chunkedMsgCtx.chunkedMessageIds[chunkId]
boolean repeatedlyReceived = prevChunkMsgId.ledgerId =
messageId.getLedgerId()
&& prevChunkMsgId.entryId = messageId.getEntryId();
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]