Siddhant Sangwan created HDDS-12409:
---------------------------------------
Summary: Log an error before increasing the sequence id of a
CLOSED container in SCM
Key: HDDS-12409
URL: https://issues.apache.org/jira/browse/HDDS-12409
Project: Apache Ozone
Issue Type: Bug
Components: SCM
Reporter: Siddhant Sangwan
There have been situations where a replica reports with higher BCSID than SCM
knows for a container that is already CLOSED. This ideally should not happen,
but can happen because of bugs in the Datanode side applyTransaction and ratis
group removal path.
Currently, when handling a container report in SCM:
{code}
if (isHealthy(replicaProto::getState)) {
if (containerInfo.getSequenceId() <
replicaProto.getBlockCommitSequenceId()) {
containerInfo.updateSequenceId(
replicaProto.getBlockCommitSequenceId());
}
{code}
we check if the replica is healthy and if the container's sequence is lower
than the replica's. We then update the sequence id:
{code}
public void updateSequenceId(long sequenceID) {
assert (isOpen() || state == HddsProtos.LifeCycleState.QUASI_CLOSED);
sequenceId = max(sequenceID, sequenceId);
}
{code}
There's an assert statement there because we don't expect to update a CLOSED
container's sequence id, but if the code is built without -enableassertions,
this will not fail.
I propose to log an error message here to make this situation visible in the
logs. We need further discussion on whether updating the sequence id of a
CLOSED container should be allowed at all by default - should we crash the SCM
and allow it only once an admin has reviewed the situation and explicitly set a
configuration that this update should be allowed? This jira is restricted to
logging, a separate jira should be created to change the default behaviour.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]