[ 
https://issues.apache.org/jira/browse/HDDS-11121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDDS-11121:
------------------------------
    Summary: DeletedBlockLogImpl#onMessage Inter-process communication UUID 
inconsistency.  (was: Improve SCM deletion efficiency.)

> DeletedBlockLogImpl#onMessage Inter-process communication UUID inconsistency.
> -----------------------------------------------------------------------------
>
>                 Key: HDDS-11121
>                 URL: https://issues.apache.org/jira/browse/HDDS-11121
>             Project: Apache Ozone
>          Issue Type: Improvement
>          Components: SCM
>            Reporter: Shilun Fan
>            Assignee: Shilun Fan
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: image-2024-07-12-09-37-23-618.png, screenshot-1.png
>
>
> Our Ozone cluster has recently encountered some issues with data deletion. We 
> found that the SCM was unable to automatically clean up the data in the 
> deletion queue, preventing the completion of the entire deletion process. 
> During our problem analysis, we discovered an issue with 
> {{{}DeletedBlockLogImpl#onMessage{}}}. The UUID transmitted from the DN via 
> RPC was not recognized by the SCM, resulting in an "Unknown Datanode" 
> exception. We attempted to fix this issue and made some progress.
> {code:java}
> 024-07-08 12:08:19,606 
> [scm2-EventQueue-DeleteBlockStatusForDeletedBlockLogImpl] WARN 
> org.apache.hadoop.hdds.scm.block.SCMDeletedBlockTransactionStatusManager$SCMDeleteBlocksCommandStatusManager:
>  Unknown Datanode: 9df75b64-d0e4-44ae-9bc0-9355371c8a5b Scm Command ID: 
> 1720041450931 report status PENDING
> 2024-07-08 12:08:19,606 
> [scm2-EventQueue-DeleteBlockStatusForDeletedBlockLogImpl] WARN 
> org.apache.hadoop.hdds.scm.block.SCMDeletedBlockTransactionStatusManager$SCMDeleteBlocksCommandStatusManager:
>  Unknown Datanode: 9df75b64-d0e4-44ae-9bc0-9355371c8a5b Scm Command ID: 
> 1719241427194 report status PENDING
> 2024-07-08 12:08:19,606 
> [scm2-EventQueue-DeleteBlockStatusForDeletedBlockLogImpl] WARN 
> org.apache.hadoop.hdds.scm.block.SCMDeletedBlockTransactionStatusManager$SCMDeleteBlocksCommandStatusManager:
>  Unknown Datanode: 9df75b64-d0e4-44ae-9bc0-9355371c8a5b Scm Command ID: 
> 1720041450931 report status PENDING
> 2024-07-08 12:08:19,606 
> [scm2-EventQueue-DeleteBlockStatusForDeletedBlockLogImpl] WARN 
> org.apache.hadoop.hdds.scm.block.SCMDeletedBlockTransactionStatusManager$SCMDeleteBlocksCommandStatusManager:
>  Unknown Datanode: 9df75b64-d0e4-44ae-9bc0-9355371c8a5b Scm Command ID: 
> 1719241427194 report status PENDING
> 2024-07-08 12:08:19,617 
> [scm2-EventQueue-DeleteBlockStatusForDeletedBlockLogImpl] WARN 
> org.apache.hadoop.hdds.scm.block.SCMDeletedBlockTransactionStatusManager$SCMDeleteBlocksCommandStatusManager:
>  Unknown Datanode: efadefd7-4d25-42fd-a6ef-fabd64c97d7f Scm Command ID: 
> 1720041450023 report status PENDING
> 2024-07-08 12:08:19,664 
> [scm2-EventQueue-DeleteBlockStatusForDeletedBlockLogImpl] WARN 
> org.apache.hadoop.hdds.scm.block.SCMDeletedBlockTransactionStatusManager$SCMDeleteBlocksCommandStatusManager:
>  Unknown Datanode: 0c4b82eb-3856-4984-9b0d-d9670089921b Scm Command ID: 
> 1720106401909 report status PENDING
> 2024-07-08 12:08:19,664 
> [scm2-EventQueue-DeleteBlockStatusForDeletedBlockLogImpl] WARN 
> org.apache.hadoop.hdds.scm.block.SCMDeletedBlockTransactionStatusManager$SCMDeleteBlocksCommandStatusManager:
>  Unknown Datanode: 0c4b82eb-3856-4984-9b0d-d9670089921b Scm Command ID: 
> 1719241427294 report status PENDING {code}
> {code:java}
> 2024-07-12 08:35:37,032 
> [scm3-EventQueue-DeleteBlockStatusForDeletedBlockLogImpl] DEBUG 
> org.apache.hadoop.hdds.scm.block.DeletedBlockLogImpl: remoteDnId = 
> 888a550f-c59c-4dde-ba3e-3dcf8f9593e0, localDnId = 
> 888a550f-c59c-4dde-ba3e-3dcf8f9593e0, remoteDnId == localDnId[false]
> 2024-07-12 08:35:37,032 
> [scm3-EventQueue-DeleteBlockStatusForDeletedBlockLogImpl] DEBUG 
> org.apache.hadoop.hdds.scm.block.DeletedBlockLogImpl: remoteDnId = 
> c7919796-18fa-4f00-af94-9b7ebc21a572, localDnId = 
> c7919796-18fa-4f00-af94-9b7ebc21a572, remoteDnId == localDnId[false]
> 2024-07-12 08:35:37,032 
> [scm3-EventQueue-DeleteBlockStatusForDeletedBlockLogImpl] DEBUG 
> org.apache.hadoop.hdds.scm.block.DeletedBlockLogImpl: remoteDnId = 
> 596cd6c8-ecc7-48da-8039-75fe59d65846, localDnId = 
> 596cd6c8-ecc7-48da-8039-75fe59d65846, remoteDnId == localDnId[false]
> 2024-07-12 08:35:37,033 
> [scm3-EventQueue-DeleteBlockStatusForDeletedBlockLogImpl] DEBUG 
> org.apache.hadoop.hdds.scm.block.DeletedBlockLogImpl: remoteDnId = 
> de559349-fd76-4a5a-9acb-007432ba1876, localDnId = 
> de559349-fd76-4a5a-9acb-007432ba1876, remoteDnId == localDnId[false]
> 2024-07-12 08:35:37,033 
> [scm3-EventQueue-DeleteBlockStatusForDeletedBlockLogImpl] DEBUG 
> org.apache.hadoop.hdds.scm.block.DeletedBlockLogImpl: remoteDnId = 
> 6a750295-7e7c-4786-b28c-f78509c41a02, localDnId = 
> 6a750295-7e7c-4786-b28c-f78509c41a02, remoteDnId == localDnId[false] {code}
> On July 8th, we applied this PR in the production environment. Currently, SCM 
> deletion can proceed normally, as shown in the Grafana screenshot below.
> !image-2024-07-12-09-37-23-618.png!
>  !screenshot-1.png! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to