liangyepianzhou commented on code in PR #20948:
URL: https://github.com/apache/pulsar/pull/20948#discussion_r1306765177
##########
pulsar-client/src/main/java/org/apache/pulsar/client/impl/ConsumerImpl.java:
##########
@@ -1449,6 +1450,23 @@ private ByteBuf processMessageChunk(ByteBuf
compressedPayload, MessageMetadata m
// discard message if chunk is out-of-order
if (chunkedMsgCtx == null || chunkedMsgCtx.chunkedMsgBuffer == null
|| msgMetadata.getChunkId() !=
(chunkedMsgCtx.lastChunkedMessageId + 1)) {
+ // Filter duplicated chunks instead of discard it.
+ if (chunkedMsgCtx == null || msgMetadata.getChunkId() <=
chunkedMsgCtx.lastChunkedMessageId) {
+ log.warn("[{}] Receive a repeated chunk messageId {},
last-chunk-id{}, chunkId = {}",
+ msgMetadata.getProducerName(), chunkedMsgCtx == null ?
null
+ : chunkedMsgCtx.lastChunkedMessageId, msgId,
msgMetadata.getChunkId());
+ compressedPayload.release();
+ increaseAvailablePermits(cnx);
+ if (chunkedMsgCtx != null) {
Review Comment:
>It seems to be inefficient to iterate all chunks every time. Can we
optimize it? I believe all chunk message ids(ledger and entry) for the same
message should be the same, aren't they? Can't we check the last chunk's
messageId only?
In fact, all the chunks in a chunk message are different. I also just
learned it.
>It seems like the processMessageChunk does not have the id check logic. Why
are we introducing this check in this PR?
Because we should check whether it is a duplicated chunk persistent in the
topic or received twice by the consumer
For example:
**Case 1, duplicated chunk persistent in the topic:**
1: uuid=p-0, mid:1:1, chunk 1 sequence ID: 0, chunk ID: 0
2: uuid=p-0, mid:1:2, chunk 2 sequence ID: 0, chunk ID: 1
3: uuid=p-0, mid:1:3, chunk 1 sequence ID: 0, chunk ID: 0 // should be acked
3: uuid=p-0, mid:1:4, chunk 2 sequence ID: 0, chunk ID: 1 // should be acked
**Case 2, received twice by the consumer:**
1: uuid=p-0, mid:1:1, chunk 1 sequence ID: 0, chunk ID: 0
2: uuid=p-0, mid:1:2, chunk 2 sequence ID: 0, chunk ID: 1
3: uuid=p-0, mid:1:1, chunk 1 sequence ID: 0, chunk ID: 0 // Just ignore it
3: uuid=p-0, mid:1:2, chunk 2 sequence ID: 0, chunk ID: 1 // Just ignore it
##########
pulsar-client/src/main/java/org/apache/pulsar/client/impl/ConsumerImpl.java:
##########
@@ -1449,6 +1450,24 @@ private ByteBuf processMessageChunk(ByteBuf
compressedPayload, MessageMetadata m
// discard message if chunk is out-of-order
if (chunkedMsgCtx == null || chunkedMsgCtx.chunkedMsgBuffer == null
|| msgMetadata.getChunkId() !=
(chunkedMsgCtx.lastChunkedMessageId + 1)) {
+ // Filter duplicated chunks instead of discard it. (Only do this
when exist duplication in a chunk message)
+ // For example:
+ // Chunk-1 sequence ID: 0, chunk ID: 0
+ // Chunk-2 sequence ID: 0, chunk ID: 0
+ // Chunk-3 sequence ID: 0, chunk ID: 1
+ if (chunkedMsgCtx != null && msgMetadata.getChunkId() <=
chunkedMsgCtx.lastChunkedMessageId) {
+ log.warn("[{}] Receive a repeated chunk messageId {},
last-chunk-id{}, chunkId = {}",
+ msgMetadata.getProducerName(),
chunkedMsgCtx.lastChunkedMessageId, msgId, msgMetadata.getChunkId());
+ compressedPayload.release();
+ increaseAvailablePermits(cnx);
+ boolean repeatedlyReceived =
Arrays.stream(chunkedMsgCtx.chunkedMessageIds)
+ .anyMatch(messageId1 -> messageId1 != null &&
messageId1.ledgerId == messageId.getLedgerId()
+ && messageId1.entryId ==
messageId.getEntryId());
+ if (!repeatedlyReceived) {
+ doAcknowledge(msgId, AckType.Individual,
Collections.emptyMap(), null);
Review Comment:
> 1: uuid=p-0-t1, mid:1:1, chunk 1 sequence ID: 0, chunk ID: 0
> 2: uuid=p-0-t1, mid:1:2, chunk 2 sequence ID: 0, chunk ID: 1
> 3: uuid=p-0-t1, mid:1:2, chunk 2 sequence ID: 0, chunk ID: 1 // ignored
> // producer restarted
> 4: uuid=p-0-t1, mid:1:3, chunk 3 sequence ID: 0, chunk ID: 0
> 5: uuid=p-0-t2, mid:1:4, chunk 4 sequence ID: 0, chunk ID: 1
> 6: uuid=p-0-t3, mid:1:5, chunk 5 sequence ID: 0, chunk ID: 2
>
> So, msg 4, 5 and 6 will complete the chunked msg in this case and msg 1
and 2 will be eventually expired.
Do you mean uuid = p-0-t2 for chunk 3,4,5? If so, it makes sense to me.
##########
pulsar-client/src/main/java/org/apache/pulsar/client/impl/ConsumerImpl.java:
##########
@@ -1449,6 +1450,24 @@ private ByteBuf processMessageChunk(ByteBuf
compressedPayload, MessageMetadata m
// discard message if chunk is out-of-order
if (chunkedMsgCtx == null || chunkedMsgCtx.chunkedMsgBuffer == null
|| msgMetadata.getChunkId() !=
(chunkedMsgCtx.lastChunkedMessageId + 1)) {
+ // Filter duplicated chunks instead of discard it. (Only do this
when exist duplication in a chunk message)
+ // For example:
+ // Chunk-1 sequence ID: 0, chunk ID: 0
+ // Chunk-2 sequence ID: 0, chunk ID: 0
+ // Chunk-3 sequence ID: 0, chunk ID: 1
+ if (chunkedMsgCtx != null && msgMetadata.getChunkId() <=
chunkedMsgCtx.lastChunkedMessageId) {
+ log.warn("[{}] Receive a repeated chunk messageId {},
last-chunk-id{}, chunkId = {}",
+ msgMetadata.getProducerName(),
chunkedMsgCtx.lastChunkedMessageId, msgId, msgMetadata.getChunkId());
+ compressedPayload.release();
+ increaseAvailablePermits(cnx);
+ boolean repeatedlyReceived =
Arrays.stream(chunkedMsgCtx.chunkedMessageIds)
+ .anyMatch(messageId1 -> messageId1 != null &&
messageId1.ledgerId == messageId.getLedgerId()
+ && messageId1.entryId ==
messageId.getEntryId());
+ if (!repeatedlyReceived) {
+ doAcknowledge(msgId, AckType.Individual,
Collections.emptyMap(), null);
Review Comment:
> Then, it seems like we don't need to iterate all
chunkedMsgCtx.chunkedMessageIds.
> I think we can check
> ```
> var prevChunkMsgId = chunkedMsgCtx.chunkedMessageIds[chunkId]
> boolean repeatedlyReceived = prevChunkMsgId.ledgerId =
messageId.getLedgerId()
> && prevChunkMsgId.entryId = messageId.getEntryId();
> ```
The retransmission of chunks by the producer might occur due to reconnection
after a connection disruption. In this scenario, the producer doesn't re-split
the chunk message but rather resends the chunks from the previously pending
message. In such cases, the resent chunk and the previously sent chunk belong
to the same chunk message, and they share the same UUID.
##########
pulsar-client/src/main/java/org/apache/pulsar/client/impl/ConsumerImpl.java:
##########
@@ -1449,6 +1450,24 @@ private ByteBuf processMessageChunk(ByteBuf
compressedPayload, MessageMetadata m
// discard message if chunk is out-of-order
if (chunkedMsgCtx == null || chunkedMsgCtx.chunkedMsgBuffer == null
|| msgMetadata.getChunkId() !=
(chunkedMsgCtx.lastChunkedMessageId + 1)) {
+ // Filter duplicated chunks instead of discard it. (Only do this
when exist duplication in a chunk message)
+ // For example:
+ // Chunk-1 sequence ID: 0, chunk ID: 0
+ // Chunk-2 sequence ID: 0, chunk ID: 0
+ // Chunk-3 sequence ID: 0, chunk ID: 1
+ if (chunkedMsgCtx != null && msgMetadata.getChunkId() <=
chunkedMsgCtx.lastChunkedMessageId) {
+ log.warn("[{}] Receive a repeated chunk messageId {},
last-chunk-id{}, chunkId = {}",
+ msgMetadata.getProducerName(),
chunkedMsgCtx.lastChunkedMessageId, msgId, msgMetadata.getChunkId());
+ compressedPayload.release();
+ increaseAvailablePermits(cnx);
+ boolean repeatedlyReceived =
Arrays.stream(chunkedMsgCtx.chunkedMessageIds)
+ .anyMatch(messageId1 -> messageId1 != null &&
messageId1.ledgerId == messageId.getLedgerId()
+ && messageId1.entryId ==
messageId.getEntryId());
+ if (!repeatedlyReceived) {
+ doAcknowledge(msgId, AckType.Individual,
Collections.emptyMap(), null);
Review Comment:
> This means we probably need to update the chunking uuid definition logic
and add a suffix there(session id, maybe the producer start-time, or some other
unique id to identify the producer session). Currently,
> ```
> String uuid = totalChunks > 1 ? String.format("%s-%d", producerName,
sequenceId) : null;
> ```
Yeah, this is a good suggestion. I changed it as follows.
```
String uuid = totalChunks > 1 ? String.format("%s-%d-%d", producerName,
sequenceId,
System.currentTimeMillis()) : null;
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]