[ 
https://issues.apache.org/jira/browse/HDDS-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashish Kumar updated HDDS-8838:
-------------------------------
    Fix Version/s: 1.4.0

> Orphan blocks can leave empty container stuck deleting forever
> --------------------------------------------------------------
>
>                 Key: HDDS-8838
>                 URL: https://issues.apache.org/jira/browse/HDDS-8838
>             Project: Apache Ozone
>          Issue Type: Sub-task
>            Reporter: Stephen O'Donnell
>            Assignee: Ashish Kumar
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.4.0
>
>
> A container can get stuck in the DELETING state in the following scenario:
> Shutdown a node and write an EC key shortly afterwards before the node has 
> been marked stale. In this case, the client can do the following:
> 1. Attempt to write to a pipeline that has the now stopped node.
> 2. Fails to write the stripe and abandons it. The client will seemlessly 
> rewrite the stripe to a new container.
> 3. This could be a new container with no other data. Now it will have some 
> chunk written to some of the replicas, but no put block will have arrived.
> 4. On the DN, the rocks DB will not have an entry for this block, but there 
> will be a block file on disk. The DN will therefore refuse to delete the 
> container if it is asked.
> 5. The stale node handler triggers the close of the pipeline. The replicas 
> are reported to SCM with zero bytes and zero keys and hence the container 
> goes to DELETING after it is CLOSED.
> 6. SCM sends delete commands which are dropped by the DN due to the chunks 
> written and the container is stuck deleting forever.
> The following logs show this happening. The key should have gone into 
> container 1 as this was a fresh cluster with no previous writes, but node 6 
> was stopped. You can see the logs for the pipeline getting closed including 
> node_6 for container 1:
> {code}
> [36mscm_1       [0m 2023-06-13 10:56:07,420 
> [EventQueue-StaleNodeForStaleNodeHandler] INFO node.StaleNodeHandler: 
> Datanode 
> 7c8bf940-3520-41e4-a1b3-672359713c5b(ozone_datanode_6.ozone_default/172.30.0.12)
>  moved to stale state. Finalizing its pipelines 
> [PipelineID=2f228c83-dad4-40cd-8415-b153564738de, 
> PipelineID=59d63d29-64f1-4a95-94cc-e75c65abb94b, 
> PipelineID=dac2f9a5-f274-4baa-9409-3d203b794905]
> [36mscm_1       [0m 2023-06-13 10:56:07,435 
> [EventQueue-StaleNodeForStaleNodeHandler] INFO pipeline.PipelineManagerImpl: 
> Container #1 closed for 
> pipeline=PipelineID=2f228c83-dad4-40cd-8415-b153564738de
> [36mscm_1       [0m 2023-06-13 10:56:07,437 
> [EventQueue-CloseContainerForCloseContainerEventHandler] INFO 
> container.CloseContainerEventHandler: Close container Event triggered for 
> container : #1, current state: CLOSING
> {code}
> Container 1 has moved to CLOSING, then we see it transiton to closed and then 
> deleting:
> {code}
> [36mscm_1       [0m 2023-06-13 10:56:16,091 
> [FixedThreadPoolWithAffinityExecutor-9-0] INFO 
> container.IncrementalContainerReportHandler: Moving container #1 to CLOSED 
> state, datanode 
> 9c7eab65-20fb-49b6-85cb-e0d4df91b14e(ozone_datanode_3.ozone_default/172.30.0.3)
>  reported CLOSED replica with index 5.
> [36mscm_1       [0m 2023-06-13 10:56:28,492 [ReplicationMonitor] INFO 
> replication.ReplicationManager: Sending command [deleteContainerCommand: 
> containerID: 1, replicaIndex: 5, force: false] for container 
> ContainerInfo{id=#1, state=CLOSED, 
> stateEnterTime=2023-06-13T10:56:16.095763Z, 
> pipelineID=PipelineID=2f228c83-dad4-40cd-8415-b153564738de, owner=om1} to 
> 9c7eab65-20fb-49b6-85cb-e0d4df91b14e(ozone_datanode_3.ozone_default/172.30.0.3)
>  with datanode deadline 1686653848492 and scm deadline 1686653878492
> [36mscm_1       [0m 2023-06-13 10:56:28,493 [ReplicationMonitor] INFO 
> replication.ReplicationManager: Sending command [deleteContainerCommand: 
> containerID: 1, replicaIndex: 4, force: false] for container 
> ContainerInfo{id=#1, state=CLOSED, 
> stateEnterTime=2023-06-13T10:56:16.095763Z, 
> pipelineID=PipelineID=2f228c83-dad4-40cd-8415-b153564738de, owner=om1} to 
> 02968363-81b4-44b9-bd9b-bfb03eb96193(ozone_datanode_1.ozone_default/172.30.0.10)
>  with datanode deadline 1686653848493 and scm deadline 1686653878493
> {code}
> Getting the key info, We can see the key is actually in container 2, so what 
> likely happened, is that the client tried to write the datanode_6 and the 
> parity on datanodes 3 and 1. The stripe got abandoned and re-written to 
> container 2 and hence the put block to update the length likely never made it 
> onto the parity nodes, but some block data did. Now the DNs will not delete 
> the container as it is not empty and it remains stuck in DELETING forever.
> {code}
> ozone sh key info /vol1/bucket1/key1
> {
>   "volumeName" : "vol1",
>   "bucketName" : "bucket1",
>   "name" : "key1",
>   "dataSize" : 175,
>   "creationTime" : "2023-06-13T10:56:04.158Z",
>   "modificationTime" : "2023-06-13T10:56:35.186Z",
>   "replicationConfig" : {
>     "data" : 3,
>     "parity" : 2,
>     "ecChunkSize" : 1048576,
>     "codec" : "RS",
>     "replicationType" : "EC",
>     "requiredNodes" : 5
>   },
>   "metadata" : { },
>   "ozoneKeyLocations" : [ {
>     "containerID" : 2,
>     "localID" : 111677748019200002,
>     "length" : 175,
>     "offset" : 0,
>     "keyOffset" : 0
>   } ]
> }
> {code}
> Checking of the DN storage directories:
> {code}
> bash-4.2$ pwd
> /data/hdds/hdds/CID-684c8137-b002-47d8-b134-7831df54e517/current/containerDir0/1
> bash-4.2$ ls -al chunks/
> total 12
> drwxr-xr-x 2 hadoop users 4096 Jun 13 10:56 .
> drwxr-xr-x 4 hadoop users 4096 Jun 13 10:56 ..
> -rw-r--r-- 1 hadoop users  175 Jun 13 10:56 111677748019200001.block
> {code}
> Dumped the rocksDB with:
> {code}
> ozone debug ldb --db=container.db scan --column_family=block_data -l=-1
> {code}
> And the blockID for the file above is not present, so the container is 
> "empty" with an orphan block in it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to