heesung-sn commented on code in PR #20948:
URL: https://github.com/apache/pulsar/pull/20948#discussion_r1290391860


##########
pulsar-client/src/main/java/org/apache/pulsar/client/impl/ProducerImpl.java:
##########
@@ -594,13 +595,18 @@ private void updateMessageMetadata(final MessageMetadata 
msgMetadata, final int
         }
     }
 
-    private long updateMessageMetadataSequenceId(final MessageMetadata 
msgMetadata) {
+    private long updateMessageMetadataSequenceId(final MessageMetadata 
msgMetadata, boolean isChunk) {
         final long sequenceId;
+        // We always need to increment the value of `msgIdGenerator`,
+        // regardless of whether the user has set a sequence ID when sending a 
message.
         if (!msgMetadata.hasSequenceId()) {
             sequenceId = msgIdGenerator++;
             msgMetadata.setSequenceId(sequenceId);
+        } else if (isChunk) {
+            sequenceId = msgIdGenerator++;

Review Comment:
   I see. Here, we are actually introducing a new concept, `deduplication_id,` 
although it is internally still called `sequence_id`. Thi sort of violates the 
concept.
   
   ```
   setSequenceId(long sequenceId)
   Specify a custom sequence id for the message being published.
   The sequence id can be used for deduplication purposes and it needs to 
follow these rules:
   
   - sequenceId >= 0
   - Sequence id for a message needs to be greater than sequence id for earlier 
messages: sequenceId(N+1) > sequenceId(N)
   - It's not necessary for sequence ids to be consecutive. There can be holes 
between messages. Eg. the sequenceId could represent an offset or a cumulative 
size.
   
   ```
   
   Could we confirm the following cases?
   
   case 1:
   What if the user sets custom seq_id=x for one msg and then sets seq_id<=x 
for the next message again? This deduplication_id(msgIdGenerator) will be still 
bigger than the previous here, and the second msg will not be deduped. (Do we 
have a test case for this?)
   
   
   case 2:
   What if there one msg is chunked and the next msg is not chunked?
   
   - msg1(chunked 10 parts), cus_seqId = 1 dedupId = [1,10], next_dedupId=11
   
   - msg2(not chunked) cus_seqId = 5, dedup_id=5, next_dedupId=6
   
   - msg3(not chunked) cus_seqId = 8, dedup_id=8, next_dedupId=9
   
   Here, msg2 and msg3 look to be deduped, although their cus_seqid are bigger 
than prev.
   
   
    
   
   
   
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to