siddhantsangwan opened a new pull request, #7964:
URL: https://github.com/apache/ozone/pull/7964
## What changes were proposed in this pull request?
There have been situations where a replica reports with higher BCSID than
SCM knows for a container that is already CLOSED. This ideally should not
happen, but can happen because of bugs in the Datanode side applyTransaction
and ratis group removal path.
Currently, when handling a container report in SCM:
```
if (isHealthy(replicaProto::getState)) {
if (containerInfo.getSequenceId() <
replicaProto.getBlockCommitSequenceId()) {
containerInfo.updateSequenceId(
replicaProto.getBlockCommitSequenceId());
}
```
we check if the replica is healthy and if the container's sequence is lower
than the replica's. We then update the sequence id:
```
public void updateSequenceId(long sequenceID) {
assert (isOpen() || state == HddsProtos.LifeCycleState.QUASI_CLOSED);
sequenceId = max(sequenceID, sequenceId);
}
```
There's an assert statement there because we don't expect to update a CLOSED
container's sequence id, but if the code is built without `-enableassertions`,
this will not fail.
I propose to log an error message here to make this situation visible in the
logs. We need further discussion on whether updating the sequence id of a
CLOSED container should be allowed at all by default - should we crash the SCM
and allow it only once an admin has reviewed the situation and explicitly set a
configuration that this update should be allowed? **This jira is restricted to
logging**, a separate jira should be created to change the default behaviour.
## What is the link to the Apache JIRA
http://issues.apache.org/jira/browse/HDDS-12409
## How was this patch tested?
Manual testing. Here is what the log would look like:
```
2025-02-25 12:54:18,664 [main] ERROR container.ContainerReportHandler
(AbstractContainerReportHandler.java:updateContainerStats(137)) - There is a
CLOSED container with lower sequence ID than a replica. Container:
ContainerInfo{id=#1, state=CLOSED, stateEnterTime=2025-02-25T07:24:18.523Z,
pipelineID=PipelineID=b7e9594b-a121-43fb-b416-561903af525d, owner=scm},
Container's sequence ID: 100, Replica: containerID: 1
state: OPEN
keyCount: 101
blockCommitSequenceId: 101
originNodeId: "49dcc98b-f44d-4adc-b973-879044d1ab65"
isEmpty: true
, Replica's sequence ID: 101, Datanode:
49dcc98b-f44d-4adc-b973-879044d1ab65(localhost-2.3.5.5/2.3.5.5).
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]