RobertIndie opened a new issue #12402:
URL: https://github.com/apache/pulsar/issues/12402
<!---
Instructions for creating a PIP using this issue template:
1. The author(s) of the proposal will create a GitHub issue ticket using
this template.
(Optionally, it can be helpful to send a note discussing the proposal to
[email protected] mailing list before submitting this GitHub issue.
This discussion can
help developers gauge interest in the proposed changes before
formalizing the proposal.)
2. The author(s) will send a note to the [email protected] mailing list
to start the discussion, using subject prefix `[PIP] xxx`. To determine
the appropriate PIP
number `xxx`, inspect the mailing list
(https://lists.apache.org/[email protected])
for the most recent PIP. Add 1 to that PIP's number to get your PIP's
number.
3. Based on the discussion and feedback, some changes might be applied by
the author(s) to the text of the proposal.
4. Once some consensus is reached, there will be a vote to formally approve
the proposal. The vote will be held on the [email protected]
mailing list. Everyone
is welcome to vote on the proposal, though it will considered to be
binding
only the vote of PMC members. It will be required to have a lazy
majority of
at least 3 binding +1s votes. The vote should stay open for at least 48
hours.
5. When the vote is closed, if the outcome is positive, the state of the
proposal is updated and the Pull Requests associated with this proposal
can
start to get merged into the master branch.
-->
## Motivation
Currently, when we send chunked messages, the producer returns the
message-id of the last chunk. This can cause some problems. For example, when
we use this message-id to seek, it will cause the consumer to consume from the
position of the last chunk, and the consumer will mistakenly think that the
previous chunks are lost and choose to skip the current message. If we use the
inclusive seek, the consumer may skip the first message, which brings the wrong
behavior.
Here is the simple code used to demonstrate the problem.
```java
var msgId = producer.send(...); // eg. return 0:1:-1
var otherMsg = producer.send(...); // return 0:2:-1
consumer.seek(msgId); // inclusive seek
var receiveMsgId = consumer.receive().getMessageId(); // it may skip the
first message and return like 0:2:-1
Assert.assertEquals(msgId, receiveMsgId); // fail
```
Earlier, we tried to fix the problem by having the producer and the consumer
return the
firstChunkMessageID.([Discussion](https://lists.apache.org/x/thread.html/r63b3153937a26c3913d0b36607ee25ad67337728d490fd616cdd06b2@%3Cdev.pulsar.apache.org%3E)
and [Draft pull requests](https://github.com/apache/pulsar/pull/12171)).
However, this may have some impact on the original business logic. If users
rely on the feature of returning lastChunkMessageId, they will be affected. For
this reason, we propose a new solution to minimize the impact. In this PIP, the
expected impact for the original user will only occur when seeking the chunk
message.
## Goal
We can solve the above problem by introducing chunk message ID to the
producer and consumer. Here are some goals for this PIP:
* **Compatibility**: When the Producer and the consumer are processing the
chunk Message, the chunk message-id is returned to the user. In order to
achieve better compatibility with the original business logic, the chunk
message-id need to be consistent with the original behavior.
* **New Feature**: The user can get the message-id of the first chunk and
the last chunk by the chunk message-id.
* **Fix for consumer.seek**: To fix the above problem, the consumer will use
lastChunkMessageId if the message-id passed in is a chunk message id when
seeking.
## API Changes and Implementation
1. Introduce a new Message ID type: Chunk Message ID. The chunk message id
inherits from MessageIdImpl and adds two new methods: getFirstChunkMessageId
and getLastChunkMessageID. For other method implementations, the
lastChunkMessageID is called directly, which is compatible with much of the
existing business logic.
Here is the demo codes for the ChunkMessageID:
```java
public class ChunkMessageIdImpl extends MessageIdImpl implements MessageId {
private final MessageIdImpl firstChunkMsgId;
public ChunkMessageIdImpl(MessageIdImpl firstChunkMsgId, MessageIdImpl
lastChunkMsgId) {
super(lastChunkMsgId.getLedgerId(), lastChunkMsgId.getEntryId(),
lastChunkMsgId.getPartitionIndex());
this.firstChunkMsgId = firstChunkMsgId;
}
public MessageIdImpl getFirstChunkMsgId() {
return firstChunkMsgId;
}
public MessageIdImpl getLastChunkMsgId() {
return this;
}
}
```
2. The chunk message-id is returned to the user when the Producer produces
the chunk message or when the consumer consumes the chunk message.
3. In cosumer.seek, use the first chunk message-id of the chunk message-id.
This will solve the problem caused by seeking chunk messages. This is also the
impact of this PIP on the original business logic.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]