swamirishi commented on PR #7402: URL: https://github.com/apache/ozone/pull/7402#issuecomment-2487026394
[MissingContainerSet](https://github.com/apache/ozone/blob/4e603aa93c8479349776c1597ef54504453cb512/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/ContainerSet.java#L62) in the containerSet was added as part of solving [HDDS-935](https://issues.apache.org/jira/browse/HDDS-935), which is trying to solve the same problem (**Avoid creating an already created container on a datanode in case of disk removal followed by datanode restart**) we are trying to tackle here. But unfortunately the solution only accounted for ratis container which was because it was only container type back then. The solution looks into adding all the containers that are present in the [ratis snapshot's containerBCSIdMap](https://github.com/apache/ozone/blob/2139367b52d68773f25d79abce0439323dde9671/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/transport/server/ratis/ContainerStateMachine.java#L313-L323) but were not found on disk after scanning through all the data volumes in the node. [HDDS-8448](https://issues.apache.org/jira/browse/HDDS-8448) caused a regression resulting removing the containers on a failed volume instead of marking it as missing. This patch is trying to do something along the same lines by extending the working for EC containers and is trying to fix the issues in the implementation for handling volume failures in the system. I believe this particular implementation was working fine for ratis container before HDDS-8448 caused a regression. Creating a duplicate data structure performing the same functionalities is just redundant and could lead to future bugs because we would have to update 2 structures in place of two. I also don't see any value in implementing something all over again which would also involve thorough testing of that flow. IMO we should be reusing the existing frameworks in the system to achieve something rather than add more flows complicating the already existing system further. If you still strongly feel we should move this logic out of the ContainerSet class then my opinion would be to get rid of the missingContainerSet entirely from all of our code flows and have only one of the 2 but not both. @sodonnel @kerneltime @adoroszlai what do you think we should do here? I have addressed the other review comments on the patch (Getting rid of the Proto2EnumCodec, Removing ReferenceCountedDB for the WitnessedContainerMetadataStore) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
