[ 
https://issues.apache.org/jira/browse/HDDS-8141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Rose updated HDDS-8141:
-----------------------------
    Description: 
This exception has been noticed a few times in datanode logs
{code:java}
2023-02-16 14:57:11,330 ERROR 
org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler: Received container 
deletion command for container 54652 but the container is not empty.
2023-02-16 14:57:11,330 ERROR 
org.apache.hadoop.ozone.container.common.statemachine.commandhandler.DeleteContainerCommandHandler:
 Exception occurred while deleting the container.
org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: 
Non-force deletion of non-empty container is not allowed.
        at 
org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.deleteInternal(KeyValueHandler.java:1133)
        at 
org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.deleteContainer(KeyValueHandler.java:1094)
        at 
org.apache.hadoop.ozone.container.ozoneimpl.ContainerController.deleteContainer(ContainerController.java:182)
        at 
org.apache.hadoop.ozone.container.common.statemachine.commandhandler.DeleteContainerCommandHandler.lambda$handle$0(DeleteContainerCommandHandler.java:75)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750)
{code}
This is a defensive code path that checks the block count metadata in RocksDB 
to determine if the container is empty. It is not expected to be hit.

The last delete block command for this container was logged about 5 minutes 
prior to this message. When checking the disk of a few containers where this 
happened, we noticed there were no block files present there. Logs show SCM 
would retry the delete but get the same result every time.

Later on, the container inspector was run on this cluster and it reported that 
there was only one copy of this container in the whole cluster. It had the 
following metadata:
{code:java}
{
  "containerID": 54652,
  "schemaVersion": "2",
  "containerState": "CLOSED",
  "currentDatanodeID": "a160b3e2-a450-446d-a75c-898241a1ff7a",
  "originDatanodeID": "a160b3e2-a450-446d-a75c-898241a1ff7a",
  "dBMetadata": {
    "#BLOCKCOUNT": -6,
    "#BYTESUSED": -1431232412,
    "#PENDINGDELETEBLOCKCOUNT": 0,
    "#delTX": 46312,
    "#BCSID": 1548650
  },
  "aggregates": {
    "blockCount": 0,
    "usedBytes": 0,
    "pendingDeleteBlocks": 0,
    "pendingDeleteBytes": 0
  },
  "chunksDirectory": {
    "path": 
"/hadoop-ozone/datanode/data/hdds/CID-30dff43d-34c2-4855-991f-797164dcb259/current/containerDir106/54652/chunks",
    "present": true,
    "fileCount": 0
  },
  "dBMetadataDeleteCount_minus_aggregatedDeleteCount": 0,
  "correct": false,
  "errors": [
    {
      "property": "dBMetadata.#BLOCKCOUNT",
      "expected": 0,
      "actual": -6,
      "repaired": false
    },
    {
      "property": "dBMetadata.#BYTESUSED",
      "expected": 0,
      "actual": -1431232412,
      "repaired": false
    }
  ]
}
{code}

  was:
This exception has been noticed a few times in datanode logs
{code:java}
2023-02-16 14:57:11,330 ERROR 
org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler: Received container 
deletion command for container 54652 but the container is not empty.
2023-02-16 14:57:11,330 ERROR 
org.apache.hadoop.ozone.container.common.statemachine.commandhandler.DeleteContainerCommandHandler:
 Exception occurred while deleting the container.
org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: 
Non-force deletion of non-empty container is not allowed.
        at 
org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.deleteInternal(KeyValueHandler.java:1133)
        at 
org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.deleteContainer(KeyValueHandler.java:1094)
        at 
org.apache.hadoop.ozone.container.ozoneimpl.ContainerController.deleteContainer(ContainerController.java:182)
        at 
org.apache.hadoop.ozone.container.common.statemachine.commandhandler.DeleteContainerCommandHandler.lambda$handle$0(DeleteContainerCommandHandler.java:75)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750)
{code}
This is a defensive code path that checks the block count metadata in RocksDB 
to determine if the container is empty. It is not expected to be hit.

The last delete block command for this container was logged about 5 minutes 
prior to this message. When checking the disk of a few containers where this 
happened, we noticed there were no block files present there. Logs show SCM 
would retry the delete but get the same result every time.

Later on, the container inspector was run on this cluster and it reported that 
there was only one copy of this container in the whole cluster. It had the 
following metadata:
{code:java}
{
  "containerID": 54652,
  "schemaVersion": "2",
  "containerState": "CLOSED",
  "currentDatanodeID": "a160b3e2-a450-446d-a75c-898241a1ff7a",
  "originDatanodeID": "a160b3e2-a450-446d-a75c-898241a1ff7a",
  "dBMetadata": {
    "#BLOCKCOUNT": -6,
    "#BYTESUSED": -1431232412,
    "#PENDINGDELETEBLOCKCOUNT": 0,
    "#delTX": 46312,
    "#BCSID": 1548650
  },
  "aggregates": {
    "blockCount": 0,
    "usedBytes": 0,
    "pendingDeleteBlocks": 0,
    "pendingDeleteBytes": 0
  },
  "chunksDirectory": {
    "path": 
"/data/qssufn48/hadoop-ozone/datanode/data/hdds/CID-30dff43d-34c2-4855-991f-797164dcb259/current/containerDir106/54652/chunks",
    "present": true,
    "fileCount": 0
  },
  "dBMetadataDeleteCount_minus_aggregatedDeleteCount": 0,
  "correct": false,
  "errors": [
    {
      "property": "dBMetadata.#BLOCKCOUNT",
      "expected": 0,
      "actual": -6,
      "repaired": false
    },
    {
      "property": "dBMetadata.#BYTESUSED",
      "expected": 0,
      "actual": -1431232412,
      "repaired": false
    }
  ]
}
{code}


> Exception "Non-force deletion of non-empty container is not allowed" in 
> datanode logs
> -------------------------------------------------------------------------------------
>
>                 Key: HDDS-8141
>                 URL: https://issues.apache.org/jira/browse/HDDS-8141
>             Project: Apache Ozone
>          Issue Type: Sub-task
>            Reporter: Ethan Rose
>            Priority: Major
>
> This exception has been noticed a few times in datanode logs
> {code:java}
> 2023-02-16 14:57:11,330 ERROR 
> org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler: Received 
> container deletion command for container 54652 but the container is not empty.
> 2023-02-16 14:57:11,330 ERROR 
> org.apache.hadoop.ozone.container.common.statemachine.commandhandler.DeleteContainerCommandHandler:
>  Exception occurred while deleting the container.
> org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException:
>  Non-force deletion of non-empty container is not allowed.
>       at 
> org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.deleteInternal(KeyValueHandler.java:1133)
>       at 
> org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.deleteContainer(KeyValueHandler.java:1094)
>       at 
> org.apache.hadoop.ozone.container.ozoneimpl.ContainerController.deleteContainer(ContainerController.java:182)
>       at 
> org.apache.hadoop.ozone.container.common.statemachine.commandhandler.DeleteContainerCommandHandler.lambda$handle$0(DeleteContainerCommandHandler.java:75)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>       at java.lang.Thread.run(Thread.java:750)
> {code}
> This is a defensive code path that checks the block count metadata in RocksDB 
> to determine if the container is empty. It is not expected to be hit.
> The last delete block command for this container was logged about 5 minutes 
> prior to this message. When checking the disk of a few containers where this 
> happened, we noticed there were no block files present there. Logs show SCM 
> would retry the delete but get the same result every time.
> Later on, the container inspector was run on this cluster and it reported 
> that there was only one copy of this container in the whole cluster. It had 
> the following metadata:
> {code:java}
> {
>   "containerID": 54652,
>   "schemaVersion": "2",
>   "containerState": "CLOSED",
>   "currentDatanodeID": "a160b3e2-a450-446d-a75c-898241a1ff7a",
>   "originDatanodeID": "a160b3e2-a450-446d-a75c-898241a1ff7a",
>   "dBMetadata": {
>     "#BLOCKCOUNT": -6,
>     "#BYTESUSED": -1431232412,
>     "#PENDINGDELETEBLOCKCOUNT": 0,
>     "#delTX": 46312,
>     "#BCSID": 1548650
>   },
>   "aggregates": {
>     "blockCount": 0,
>     "usedBytes": 0,
>     "pendingDeleteBlocks": 0,
>     "pendingDeleteBytes": 0
>   },
>   "chunksDirectory": {
>     "path": 
> "/hadoop-ozone/datanode/data/hdds/CID-30dff43d-34c2-4855-991f-797164dcb259/current/containerDir106/54652/chunks",
>     "present": true,
>     "fileCount": 0
>   },
>   "dBMetadataDeleteCount_minus_aggregatedDeleteCount": 0,
>   "correct": false,
>   "errors": [
>     {
>       "property": "dBMetadata.#BLOCKCOUNT",
>       "expected": 0,
>       "actual": -6,
>       "repaired": false
>     },
>     {
>       "property": "dBMetadata.#BYTESUSED",
>       "expected": 0,
>       "actual": -1431232412,
>       "repaired": false
>     }
>   ]
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to