[ 
https://issues.apache.org/jira/browse/HDDS-9005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17742601#comment-17742601
 ] 

Ethan Rose edited comment on HDDS-9005 at 7/13/23 12:32 AM:
------------------------------------------------------------

HDDS-7300 has the container scanner ignore failures while scanning a block if 
the block is deleted during the scan. This allows the scanner to run without 
holding a lock. In this case, when the container is deleted, the code to skip 
the deleted block is running for every block in the container, since the whole 
container is removed. When the scanner gets to the end of the container, it is 
not marked unhealthy since it thinks all the blocks were deleted, but it tries 
to update the last scanned timestamp since it thinks the container is still 
present.

Similar to HDDS-7300, we need to check if a container has been deleted after 
there is a scan failure, and discard the result if it has. Checking the deleted 
state in memory may require HDDS-8770.


was (Author: erose):
HDDS-7300 has the container scanner ignore failures while scanning a block if 
the block is deleted during the scan. This allows the scanner to run without 
holding a lock. In this case, when the container is deleted, the code to skip 
the deleted block is running for every block in the container, since the whole 
container is removed. When the scanner gets to the end of the container, it is 
not marked unhealthy since it thinks all the blocks were deleted, but it tries 
to update the last scanned timestamp since it thinks the container is still 
present.

Similar to HDDDS-7300, we need to check if a container has been deleted after 
there is a scan failure, and discard the result if it has. Checking the deleted 
state in memory may require HDDS-8770.

> Container scanner continues to scan deleted container
> -----------------------------------------------------
>
>                 Key: HDDS-9005
>                 URL: https://issues.apache.org/jira/browse/HDDS-9005
>             Project: Apache Ozone
>          Issue Type: Sub-task
>            Reporter: George Huang
>            Priority: Major
>
> The following was observed in the log of a datanode running the container 
> scanner:
> {code}
> 2023-07-12 06:04:39,049 
> [ContainerDataScanner(/data/2/hadoop-ozone/datanode/data/hdds)] ERROR 
> org.apache.hadoop.ozone.container.common.helpers.ContainerUtils: Chunks dir 
> /data/2/hadoop-ozone/datanode/data/hdds/CID-c4cbea3d-ac0b-4956-a8cc-6fd5f3a55ec4/current/containerDir2/1104/chunks
>  does not exist
> 2023-07-12 06:04:39,061 
> [ContainerDataScanner(/data/2/hadoop-ozone/datanode/data/hdds)] ERROR 
> org.apache.hadoop.ozone.container.common.helpers.ContainerUtils: Chunks dir 
> /data/2/hadoop-ozone/datanode/data/hdds/CID-c4cbea3d-ac0b-4956-a8cc-6fd5f3a55ec4/current/containerDir2/1104/chunks
>  does not exist
> 2023-07-12 06:04:39,062 
> [ContainerDataScanner(/data/2/hadoop-ozone/datanode/data/hdds)] ERROR 
> org.apache.hadoop.ozone.container.common.helpers.ContainerUtils: Chunks dir 
> /data/2/hadoop-ozone/datanode/data/hdds/CID-c4cbea3d-ac0b-4956-a8cc-6fd5f3a55ec4/current/containerDir2/1104/chunks
>  does not exist
> 2023-07-12 06:04:39,063 
> [ContainerDataScanner(/data/2/hadoop-ozone/datanode/data/hdds)] ERROR 
> org.apache.hadoop.ozone.container.common.helpers.ContainerUtils: Chunks dir 
> /data/2/hadoop-ozone/datanode/data/hdds/CID-c4cbea3d-ac0b-4956-a8cc-6fd5f3a55ec4/current/containerDir2/1104/chunks
>  does not exist
> 2023-07-12 06:04:39,064 
> [ContainerDataScanner(/data/2/hadoop-ozone/datanode/data/hdds)] ERROR 
> org.apache.hadoop.ozone.container.common.helpers.ContainerUtils: Chunks dir 
> /data/2/hadoop-ozone/datanode/data/hdds/CID-c4cbea3d-ac0b-4956-a8cc-6fd5f3a55ec4/current/containerDir2/1104/chunks
>  does not exist
> 2023-07-12 06:04:39,065 
> [ContainerDataScanner(/data/2/hadoop-ozone/datanode/data/hdds)] ERROR 
> org.apache.hadoop.ozone.container.common.helpers.ContainerUtils: Chunks dir 
> /data/2/hadoop-ozone/datanode/data/hdds/CID-c4cbea3d-ac0b-4956-a8cc-6fd5f3a55ec4/current/containerDir2/1104/chunks
>  does not exist
> 2023-07-12 06:04:39,066 
> [ContainerDataScanner(/data/2/hadoop-ozone/datanode/data/hdds)] ERROR 
> org.apache.hadoop.ozone.container.common.helpers.ContainerUtils: Chunks dir 
> /data/2/hadoop-ozone/datanode/data/hdds/CID-c4cbea3d-ac0b-4956-a8cc-6fd5f3a55ec4/current/containerDir2/1104/chunks
>  does not exist
> 2023-07-12 06:04:39,066 
> [ContainerDataScanner(/data/2/hadoop-ozone/datanode/data/hdds)] WARN 
> org.apache.hadoop.ozone.container.ozoneimpl.ContainerController: Container 
> 1104 not found, may be deleted, skip update DataScanTimestamp
> {code}
> {{ozone admin container info}} showed that the container replica was no 
> longer present on the datanode that logged these messages. The replica was 
> likely deleted due to over-replication while the scanner was running.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org

Reply via email to