siddhantsangwan opened a new pull request, #7964:
URL: https://github.com/apache/ozone/pull/7964

   ## What changes were proposed in this pull request?
   
   There have been situations where a replica reports with higher BCSID than 
SCM knows for a container that is already CLOSED. This ideally should not 
happen, but can happen because of bugs in the Datanode side applyTransaction 
and ratis group removal path.
   
   Currently, when handling a container report in SCM:
   ```
       if (isHealthy(replicaProto::getState)) {
         if (containerInfo.getSequenceId() <
             replicaProto.getBlockCommitSequenceId()) {
           containerInfo.updateSequenceId(
               replicaProto.getBlockCommitSequenceId());
         }
   ```
   we check if the replica is healthy and if the container's sequence is lower 
than the replica's. We then update the sequence id:
   ```
     public void updateSequenceId(long sequenceID) {
       assert (isOpen() || state == HddsProtos.LifeCycleState.QUASI_CLOSED);
       sequenceId = max(sequenceID, sequenceId);
     }
   ```
   
   There's an assert statement there because we don't expect to update a CLOSED 
container's sequence id, but if the code is built without `-enableassertions`, 
this will not fail.
   
   I propose to log an error message here to make this situation visible in the 
logs. We need further discussion on whether updating the sequence id of a 
CLOSED container should be allowed at all by default - should we crash the SCM 
and allow it only once an admin has reviewed the situation and explicitly set a 
configuration that this update should be allowed? **This jira is restricted to 
logging**, a separate jira should be created to change the default behaviour.
   
   ## What is the link to the Apache JIRA
   
   http://issues.apache.org/jira/browse/HDDS-12409
   
   ## How was this patch tested?
   
   Manual testing. Here is what the log would look like:
   ```
   2025-02-25 12:54:18,664 [main] ERROR container.ContainerReportHandler 
(AbstractContainerReportHandler.java:updateContainerStats(137)) - There is a 
CLOSED container with lower sequence ID than a replica. Container: 
ContainerInfo{id=#1, state=CLOSED, stateEnterTime=2025-02-25T07:24:18.523Z, 
pipelineID=PipelineID=b7e9594b-a121-43fb-b416-561903af525d, owner=scm}, 
Container's sequence ID: 100, Replica: containerID: 1
   state: OPEN
   keyCount: 101
   blockCommitSequenceId: 101
   originNodeId: "49dcc98b-f44d-4adc-b973-879044d1ab65"
   isEmpty: true
   , Replica's sequence ID: 101, Datanode: 
49dcc98b-f44d-4adc-b973-879044d1ab65(localhost-2.3.5.5/2.3.5.5).
   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to