void-ptr974 opened a new pull request, #26002:
URL: https://github.com/apache/pulsar/pull/26002
### Motivation
When a persistent replicator publish fails because the remote producer is
rejected, for example by backlog quota, `PersistentReplicator` rewinds the
cursor so the failed entry can be read and retried later.
However, the failed send was not marked as completed in the corresponding
in-flight task. As a result, the in-flight task could keep consuming a read
permit even though the cursor had already been rewound. After the producer
reconnects, the replicator might not read the failed entry again, leaving the
replication backlog stuck.
This can cause geo-replication to stop making progress after a transient
remote publish failure. It also caused
`ReplicatorTest.testResumptionAfterBacklogRelaxed` to fail intermittently
because the backlog remained at `1` instead of returning to `0`.
### Modifications
Mark the failed publish as completed in the current `InFlightTask` after
rewinding the cursor in `PersistentReplicator.ProducerSendCallback`.
The cursor rewind makes the failed entry readable again, so the in-flight
task should release its permit. This allows the replicator to resume reading
entries after the remote producer becomes available again.
Added a unit test covering the failed publish path to verify that:
- the in-flight task is completed when publish fails with
`ProducerBlockedQuotaExceededException`
- the read permit is released after the failed send is handled
### Verifying this change
- [ ] Make sure that the change passes the CI checks.
This change added tests and can be verified as follows:
- Added
`PersistentReplicatorInflightTaskTest.testFailedPublishCompletesInFlightTask`
- Verified locally with:
- `./gradlew :pulsar-broker:test --tests
org.apache.pulsar.broker.service.persistent.PersistentReplicatorInflightTaskTest.testFailedPublishCompletesInFlightTask`
- `./gradlew :pulsar-broker:test --tests
org.apache.pulsar.broker.service.ReplicatorTest.testResumptionAfterBacklogRelaxed`
### Does this pull request potentially affect one of the following parts:
- [ ] Dependencies (add or upgrade a dependency)
- [ ] The public API
- [ ] The schema
- [ ] The default values of configurations
- [ ] The threading model
- [ ] The binary protocol
- [ ] The REST endpoints
- [ ] The admin CLI options
- [ ] The metrics
- [ ] Anything that affects deployment
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]