RobertIndie opened a new issue #12402:
URL: https://github.com/apache/pulsar/issues/12402


   <!---
   Instructions for creating a PIP using this issue template:
   
    1. The author(s) of the proposal will create a GitHub issue ticket using 
this template.
       (Optionally, it can be helpful to send a note discussing the proposal to
       [email protected] mailing list before submitting this GitHub issue. 
This discussion can
       help developers gauge interest in the proposed changes before 
formalizing the proposal.)
    2. The author(s) will send a note to the [email protected] mailing list
       to start the discussion, using subject prefix `[PIP] xxx`. To determine 
the appropriate PIP
       number `xxx`, inspect the mailing list 
(https://lists.apache.org/[email protected])
       for the most recent PIP. Add 1 to that PIP's number to get your PIP's 
number.
    3. Based on the discussion and feedback, some changes might be applied by
       the author(s) to the text of the proposal.
    4. Once some consensus is reached, there will be a vote to formally approve
       the proposal. The vote will be held on the [email protected] 
mailing list. Everyone
       is welcome to vote on the proposal, though it will considered to be 
binding
       only the vote of PMC members. It will be required to have a lazy 
majority of
       at least 3 binding +1s votes. The vote should stay open for at least 48 
hours.
    5. When the vote is closed, if the outcome is positive, the state of the
       proposal is updated and the Pull Requests associated with this proposal 
can
       start to get merged into the master branch.
   
   -->
   
   ## Motivation
   
   Currently, when we send chunked messages, the producer returns the 
message-id of the last chunk. This can cause some problems. For example, when 
we use this message-id to seek, it will cause the consumer to consume from the 
position of the last chunk, and the consumer will mistakenly think that the 
previous chunks are lost and choose to skip the current message. If we use the 
inclusive seek, the consumer may skip the first message, which brings the wrong 
behavior.
   
   Here is the simple code used to demonstrate the problem.
   
   ```java
   var msgId = producer.send(...); // eg. return 0:1:-1
   
   var otherMsg = producer.send(...); // return 0:2:-1
   
   consumer.seek(msgId); // inclusive seek
   
   var receiveMsgId = consumer.receive().getMessageId(); // it may skip the
   first message and return like 0:2:-1
   
   Assert.assertEquals(msgId, receiveMsgId); // fail
   ```
   
   Earlier, we tried to fix the problem by having the producer and the consumer 
return the 
firstChunkMessageID.([Discussion](https://lists.apache.org/x/thread.html/r63b3153937a26c3913d0b36607ee25ad67337728d490fd616cdd06b2@%3Cdev.pulsar.apache.org%3E)
 and [Draft pull requests](https://github.com/apache/pulsar/pull/12171)). 
However, this may have some impact on the original business logic. If users 
rely on the feature of returning lastChunkMessageId, they will be affected. For 
this reason, we propose a new solution to minimize the impact. In this PIP, the 
expected impact for the original user will only occur when seeking the chunk 
message. 
   
   ## Goal
   
   We can solve the above problem by introducing chunk message ID to the 
producer and consumer. Here are some goals for this PIP:
   * **Compatibility**: When the Producer and the consumer are processing the 
chunk Message, the chunk message-id is returned to the user. In order to 
achieve better compatibility with the original business logic, the chunk 
message-id need to be consistent with the original behavior.
   * **New Feature**: The user can get the message-id of the first chunk and 
the last chunk by the chunk message-id.
   * **Fix for consumer.seek**: To fix the above problem, the consumer will use 
lastChunkMessageId if the message-id passed in is a chunk message id when 
seeking.
   
   
   
   ## API Changes and Implementation
   
   1. Introduce a new Message ID type: Chunk Message ID. The chunk message id 
inherits from MessageIdImpl and adds two new methods: getFirstChunkMessageId 
and getLastChunkMessageID. For other method implementations, the 
lastChunkMessageID is called directly, which is compatible with much of the 
existing business logic.
   Here is the demo codes for the ChunkMessageID:
   ```java
   public class ChunkMessageIdImpl extends MessageIdImpl implements MessageId {
       private final MessageIdImpl firstChunkMsgId;
   
       public ChunkMessageIdImpl(MessageIdImpl firstChunkMsgId, MessageIdImpl 
lastChunkMsgId) {
           super(lastChunkMsgId.getLedgerId(), lastChunkMsgId.getEntryId(), 
lastChunkMsgId.getPartitionIndex());
           this.firstChunkMsgId = firstChunkMsgId;
       }
   
       public MessageIdImpl getFirstChunkMsgId() {
           return firstChunkMsgId;
       }
   
       public MessageIdImpl getLastChunkMsgId() {
           return this;
       }
   }
   ```
   
   2. The chunk message-id is returned to the user when the Producer produces 
the chunk message or when the consumer consumes the chunk message.
   
   3. In cosumer.seek, use the first chunk message-id of the chunk message-id. 
This will solve the problem caused by seeking chunk messages. This is also the 
impact of this PIP on the original business logic. 
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to