[ https://issues.apache.org/jira/browse/HDDS-9005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17742601#comment-17742601 ]
Ethan Rose edited comment on HDDS-9005 at 7/13/23 12:32 AM: ------------------------------------------------------------ HDDS-7300 has the container scanner ignore failures while scanning a block if the block is deleted during the scan. This allows the scanner to run without holding a lock. In this case, when the container is deleted, the code to skip the deleted block is running for every block in the container, since the whole container is removed. When the scanner gets to the end of the container, it is not marked unhealthy since it thinks all the blocks were deleted, but it tries to update the last scanned timestamp since it thinks the container is still present. Similar to HDDS-7300, we need to check if a container has been deleted after there is a scan failure, and discard the result if it has. Checking the deleted state in memory may require HDDS-8770. was (Author: erose): HDDS-7300 has the container scanner ignore failures while scanning a block if the block is deleted during the scan. This allows the scanner to run without holding a lock. In this case, when the container is deleted, the code to skip the deleted block is running for every block in the container, since the whole container is removed. When the scanner gets to the end of the container, it is not marked unhealthy since it thinks all the blocks were deleted, but it tries to update the last scanned timestamp since it thinks the container is still present. Similar to HDDDS-7300, we need to check if a container has been deleted after there is a scan failure, and discard the result if it has. Checking the deleted state in memory may require HDDS-8770. > Container scanner continues to scan deleted container > ----------------------------------------------------- > > Key: HDDS-9005 > URL: https://issues.apache.org/jira/browse/HDDS-9005 > Project: Apache Ozone > Issue Type: Sub-task > Reporter: George Huang > Priority: Major > > The following was observed in the log of a datanode running the container > scanner: > {code} > 2023-07-12 06:04:39,049 > [ContainerDataScanner(/data/2/hadoop-ozone/datanode/data/hdds)] ERROR > org.apache.hadoop.ozone.container.common.helpers.ContainerUtils: Chunks dir > /data/2/hadoop-ozone/datanode/data/hdds/CID-c4cbea3d-ac0b-4956-a8cc-6fd5f3a55ec4/current/containerDir2/1104/chunks > does not exist > 2023-07-12 06:04:39,061 > [ContainerDataScanner(/data/2/hadoop-ozone/datanode/data/hdds)] ERROR > org.apache.hadoop.ozone.container.common.helpers.ContainerUtils: Chunks dir > /data/2/hadoop-ozone/datanode/data/hdds/CID-c4cbea3d-ac0b-4956-a8cc-6fd5f3a55ec4/current/containerDir2/1104/chunks > does not exist > 2023-07-12 06:04:39,062 > [ContainerDataScanner(/data/2/hadoop-ozone/datanode/data/hdds)] ERROR > org.apache.hadoop.ozone.container.common.helpers.ContainerUtils: Chunks dir > /data/2/hadoop-ozone/datanode/data/hdds/CID-c4cbea3d-ac0b-4956-a8cc-6fd5f3a55ec4/current/containerDir2/1104/chunks > does not exist > 2023-07-12 06:04:39,063 > [ContainerDataScanner(/data/2/hadoop-ozone/datanode/data/hdds)] ERROR > org.apache.hadoop.ozone.container.common.helpers.ContainerUtils: Chunks dir > /data/2/hadoop-ozone/datanode/data/hdds/CID-c4cbea3d-ac0b-4956-a8cc-6fd5f3a55ec4/current/containerDir2/1104/chunks > does not exist > 2023-07-12 06:04:39,064 > [ContainerDataScanner(/data/2/hadoop-ozone/datanode/data/hdds)] ERROR > org.apache.hadoop.ozone.container.common.helpers.ContainerUtils: Chunks dir > /data/2/hadoop-ozone/datanode/data/hdds/CID-c4cbea3d-ac0b-4956-a8cc-6fd5f3a55ec4/current/containerDir2/1104/chunks > does not exist > 2023-07-12 06:04:39,065 > [ContainerDataScanner(/data/2/hadoop-ozone/datanode/data/hdds)] ERROR > org.apache.hadoop.ozone.container.common.helpers.ContainerUtils: Chunks dir > /data/2/hadoop-ozone/datanode/data/hdds/CID-c4cbea3d-ac0b-4956-a8cc-6fd5f3a55ec4/current/containerDir2/1104/chunks > does not exist > 2023-07-12 06:04:39,066 > [ContainerDataScanner(/data/2/hadoop-ozone/datanode/data/hdds)] ERROR > org.apache.hadoop.ozone.container.common.helpers.ContainerUtils: Chunks dir > /data/2/hadoop-ozone/datanode/data/hdds/CID-c4cbea3d-ac0b-4956-a8cc-6fd5f3a55ec4/current/containerDir2/1104/chunks > does not exist > 2023-07-12 06:04:39,066 > [ContainerDataScanner(/data/2/hadoop-ozone/datanode/data/hdds)] WARN > org.apache.hadoop.ozone.container.ozoneimpl.ContainerController: Container > 1104 not found, may be deleted, skip update DataScanTimestamp > {code} > {{ozone admin container info}} showed that the container replica was no > longer present on the datanode that logged these messages. The replica was > likely deleted due to over-replication while the scanner was running. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For additional commands, e-mail: issues-h...@ozone.apache.org