[ 
https://issues.apache.org/jira/browse/HDDS-8140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17700230#comment-17700230
 ] 

Sumit Agrawal commented on HDDS-8140:
-------------------------------------

This is the feature where disk is removed having container data, to avoid 
incorrect data:
 * It builds all container present in the file (snapshot)
 * During startup, if it finds some container not present over disk, but 
present in snapshot, it identify that as missing
 * DN do not allow this container to be created using write, as,
 ** if writeChunk create a container with data with latest BCSID, it will be 
considered as latest data, but actually its missing other data earlier used to 
present
 ** So this will reject creation of container which is lost earlier

 

So if the map is not updated for container even deleted normally, it will be 
reported as missing on DN startup as present in snapshot but not present over 
disk.

This *case may not have issue* if container is deleted normally and which is 
not expected to be created again. Just may be log only.

 

To fix, we may need update this map removing containerId with lock when not 
needed.

 

With Schema V3 schema, do we need this logic? Or may be backing of V3 schema DB 
?

 

> Startup warning about adding containers to missing container set
> ----------------------------------------------------------------
>
>                 Key: HDDS-8140
>                 URL: https://issues.apache.org/jira/browse/HDDS-8140
>             Project: Apache Ozone
>          Issue Type: Sub-task
>            Reporter: Ethan Rose
>            Assignee: Sumit Agrawal
>            Priority: Major
>
> This message was observed in the logs on startup for 250 containers. The same 
> message occurred on 3/6 datanodes for the same set of containers on each node:
> {code}
> 2023-02-16 16:43:05,799 WARN 
> org.apache.hadoop.ozone.container.common.impl.ContainerSet: Adding container 
> 64079 to missing container set.
> {code}
> This is a byproduct of HDDS-935, which is an old and involved change. In that 
> change there is a TODO in 
> [{{ContainerStateMachine#persistContainerSet}}|https://github.com/apache/ozone/blob/7f22916889b7bf39cdb31e5943cae5768f368198/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/transport/server/ratis/ContainerStateMachine.java#L279]
>  that says there will be a race if container delete does not go through 
> Ratis. When container deletion was implemented after that change, it did not 
> go through Ratis so the race may happen. We need to revisit this area of the 
> code.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to