[ 
https://issues.apache.org/jira/browse/HDDS-11121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDDS-11121:
------------------------------
    Description: 
Our Ozone cluster has recently encountered some issues with data deletion. We 
found that the SCM was unable to automatically clean up the data in the 
deletion queue, preventing the completion of the entire deletion process. 
During our problem analysis, we discovered an issue with 
{{{}DeletedBlockLogImpl#onMessage{}}}. The UUID transmitted from the DN via RPC 
was not recognized by the SCM, resulting in an "Unknown Datanode" exception. We 
attempted to fix this issue and made some progress.
{code:java}
024-07-08 12:08:19,606 
[scm2-EventQueue-DeleteBlockStatusForDeletedBlockLogImpl] WARN 
org.apache.hadoop.hdds.scm.block.SCMDeletedBlockTransactionStatusManager$SCMDeleteBlocksCommandStatusManager:
 Unknown Datanode: 9df75b64-d0e4-44ae-9bc0-9355371c8a5b Scm Command ID: 
1720041450931 report status PENDING
2024-07-08 12:08:19,606 
[scm2-EventQueue-DeleteBlockStatusForDeletedBlockLogImpl] WARN 
org.apache.hadoop.hdds.scm.block.SCMDeletedBlockTransactionStatusManager$SCMDeleteBlocksCommandStatusManager:
 Unknown Datanode: 9df75b64-d0e4-44ae-9bc0-9355371c8a5b Scm Command ID: 
1719241427194 report status PENDING
2024-07-08 12:08:19,606 
[scm2-EventQueue-DeleteBlockStatusForDeletedBlockLogImpl] WARN 
org.apache.hadoop.hdds.scm.block.SCMDeletedBlockTransactionStatusManager$SCMDeleteBlocksCommandStatusManager:
 Unknown Datanode: 9df75b64-d0e4-44ae-9bc0-9355371c8a5b Scm Command ID: 
1720041450931 report status PENDING
2024-07-08 12:08:19,606 
[scm2-EventQueue-DeleteBlockStatusForDeletedBlockLogImpl] WARN 
org.apache.hadoop.hdds.scm.block.SCMDeletedBlockTransactionStatusManager$SCMDeleteBlocksCommandStatusManager:
 Unknown Datanode: 9df75b64-d0e4-44ae-9bc0-9355371c8a5b Scm Command ID: 
1719241427194 report status PENDING
2024-07-08 12:08:19,617 
[scm2-EventQueue-DeleteBlockStatusForDeletedBlockLogImpl] WARN 
org.apache.hadoop.hdds.scm.block.SCMDeletedBlockTransactionStatusManager$SCMDeleteBlocksCommandStatusManager:
 Unknown Datanode: efadefd7-4d25-42fd-a6ef-fabd64c97d7f Scm Command ID: 
1720041450023 report status PENDING
2024-07-08 12:08:19,664 
[scm2-EventQueue-DeleteBlockStatusForDeletedBlockLogImpl] WARN 
org.apache.hadoop.hdds.scm.block.SCMDeletedBlockTransactionStatusManager$SCMDeleteBlocksCommandStatusManager:
 Unknown Datanode: 0c4b82eb-3856-4984-9b0d-d9670089921b Scm Command ID: 
1720106401909 report status PENDING
2024-07-08 12:08:19,664 
[scm2-EventQueue-DeleteBlockStatusForDeletedBlockLogImpl] WARN 
org.apache.hadoop.hdds.scm.block.SCMDeletedBlockTransactionStatusManager$SCMDeleteBlocksCommandStatusManager:
 Unknown Datanode: 0c4b82eb-3856-4984-9b0d-d9670089921b Scm Command ID: 
1719241427294 report status PENDING {code}

{code:java}
2024-07-12 08:35:37,032 
[scm3-EventQueue-DeleteBlockStatusForDeletedBlockLogImpl] DEBUG 
org.apache.hadoop.hdds.scm.block.DeletedBlockLogImpl: remoteDnId = 
888a550f-c59c-4dde-ba3e-3dcf8f9593e0, localDnId = 
888a550f-c59c-4dde-ba3e-3dcf8f9593e0, remoteDnId == localDnId[false]
2024-07-12 08:35:37,032 
[scm3-EventQueue-DeleteBlockStatusForDeletedBlockLogImpl] DEBUG 
org.apache.hadoop.hdds.scm.block.DeletedBlockLogImpl: remoteDnId = 
c7919796-18fa-4f00-af94-9b7ebc21a572, localDnId = 
c7919796-18fa-4f00-af94-9b7ebc21a572, remoteDnId == localDnId[false]
2024-07-12 08:35:37,032 
[scm3-EventQueue-DeleteBlockStatusForDeletedBlockLogImpl] DEBUG 
org.apache.hadoop.hdds.scm.block.DeletedBlockLogImpl: remoteDnId = 
596cd6c8-ecc7-48da-8039-75fe59d65846, localDnId = 
596cd6c8-ecc7-48da-8039-75fe59d65846, remoteDnId == localDnId[false]
2024-07-12 08:35:37,033 
[scm3-EventQueue-DeleteBlockStatusForDeletedBlockLogImpl] DEBUG 
org.apache.hadoop.hdds.scm.block.DeletedBlockLogImpl: remoteDnId = 
de559349-fd76-4a5a-9acb-007432ba1876, localDnId = 
de559349-fd76-4a5a-9acb-007432ba1876, remoteDnId == localDnId[false]
2024-07-12 08:35:37,033 
[scm3-EventQueue-DeleteBlockStatusForDeletedBlockLogImpl] DEBUG 
org.apache.hadoop.hdds.scm.block.DeletedBlockLogImpl: remoteDnId = 
6a750295-7e7c-4786-b28c-f78509c41a02, localDnId = 
6a750295-7e7c-4786-b28c-f78509c41a02, remoteDnId == localDnId[false] {code}

!image-2024-07-12-09-37-23-618.png!

  was:
Our Ozone cluster has recently encountered some issues with data deletion. We 
found that the SCM was unable to automatically clean up the data in the 
deletion queue, preventing the completion of the entire deletion process. 
During our problem analysis, we discovered an issue with 
{{{}DeletedBlockLogImpl#onMessage{}}}. The UUID transmitted from the DN via RPC 
was not recognized by the SCM, resulting in an "Unknown Datanode" exception. We 
attempted to fix this issue and made some progress.
{code:java}
024-07-08 12:08:19,606 
[scm2-EventQueue-DeleteBlockStatusForDeletedBlockLogImpl] WARN 
org.apache.hadoop.hdds.scm.block.SCMDeletedBlockTransactionStatusManager$SCMDeleteBlocksCommandStatusManager:
 Unknown Datanode: 9df75b64-d0e4-44ae-9bc0-9355371c8a5b Scm Command ID: 
1720041450931 report status PENDING
2024-07-08 12:08:19,606 
[scm2-EventQueue-DeleteBlockStatusForDeletedBlockLogImpl] WARN 
org.apache.hadoop.hdds.scm.block.SCMDeletedBlockTransactionStatusManager$SCMDeleteBlocksCommandStatusManager:
 Unknown Datanode: 9df75b64-d0e4-44ae-9bc0-9355371c8a5b Scm Command ID: 
1719241427194 report status PENDING
2024-07-08 12:08:19,606 
[scm2-EventQueue-DeleteBlockStatusForDeletedBlockLogImpl] WARN 
org.apache.hadoop.hdds.scm.block.SCMDeletedBlockTransactionStatusManager$SCMDeleteBlocksCommandStatusManager:
 Unknown Datanode: 9df75b64-d0e4-44ae-9bc0-9355371c8a5b Scm Command ID: 
1720041450931 report status PENDING
2024-07-08 12:08:19,606 
[scm2-EventQueue-DeleteBlockStatusForDeletedBlockLogImpl] WARN 
org.apache.hadoop.hdds.scm.block.SCMDeletedBlockTransactionStatusManager$SCMDeleteBlocksCommandStatusManager:
 Unknown Datanode: 9df75b64-d0e4-44ae-9bc0-9355371c8a5b Scm Command ID: 
1719241427194 report status PENDING
2024-07-08 12:08:19,617 
[scm2-EventQueue-DeleteBlockStatusForDeletedBlockLogImpl] WARN 
org.apache.hadoop.hdds.scm.block.SCMDeletedBlockTransactionStatusManager$SCMDeleteBlocksCommandStatusManager:
 Unknown Datanode: efadefd7-4d25-42fd-a6ef-fabd64c97d7f Scm Command ID: 
1720041450023 report status PENDING
2024-07-08 12:08:19,664 
[scm2-EventQueue-DeleteBlockStatusForDeletedBlockLogImpl] WARN 
org.apache.hadoop.hdds.scm.block.SCMDeletedBlockTransactionStatusManager$SCMDeleteBlocksCommandStatusManager:
 Unknown Datanode: 0c4b82eb-3856-4984-9b0d-d9670089921b Scm Command ID: 
1720106401909 report status PENDING
2024-07-08 12:08:19,664 
[scm2-EventQueue-DeleteBlockStatusForDeletedBlockLogImpl] WARN 
org.apache.hadoop.hdds.scm.block.SCMDeletedBlockTransactionStatusManager$SCMDeleteBlocksCommandStatusManager:
 Unknown Datanode: 0c4b82eb-3856-4984-9b0d-d9670089921b Scm Command ID: 
1719241427294 report status PENDING {code}

{code:java}
2024-07-12 08:35:37,032 
[scm3-EventQueue-DeleteBlockStatusForDeletedBlockLogImpl] DEBUG 
org.apache.hadoop.hdds.scm.block.DeletedBlockLogImpl: remoteDnId = 
888a550f-c59c-4dde-ba3e-3dcf8f9593e0, localDnId = 
888a550f-c59c-4dde-ba3e-3dcf8f9593e0, remoteDnId == localDnId[false]
2024-07-12 08:35:37,032 
[scm3-EventQueue-DeleteBlockStatusForDeletedBlockLogImpl] DEBUG 
org.apache.hadoop.hdds.scm.block.DeletedBlockLogImpl: remoteDnId = 
c7919796-18fa-4f00-af94-9b7ebc21a572, localDnId = 
c7919796-18fa-4f00-af94-9b7ebc21a572, remoteDnId == localDnId[false]
2024-07-12 08:35:37,032 
[scm3-EventQueue-DeleteBlockStatusForDeletedBlockLogImpl] DEBUG 
org.apache.hadoop.hdds.scm.block.DeletedBlockLogImpl: remoteDnId = 
596cd6c8-ecc7-48da-8039-75fe59d65846, localDnId = 
596cd6c8-ecc7-48da-8039-75fe59d65846, remoteDnId == localDnId[false]
2024-07-12 08:35:37,033 
[scm3-EventQueue-DeleteBlockStatusForDeletedBlockLogImpl] DEBUG 
org.apache.hadoop.hdds.scm.block.DeletedBlockLogImpl: remoteDnId = 
de559349-fd76-4a5a-9acb-007432ba1876, localDnId = 
de559349-fd76-4a5a-9acb-007432ba1876, remoteDnId == localDnId[false]
2024-07-12 08:35:37,033 
[scm3-EventQueue-DeleteBlockStatusForDeletedBlockLogImpl] DEBUG 
org.apache.hadoop.hdds.scm.block.DeletedBlockLogImpl: remoteDnId = 
6a750295-7e7c-4786-b28c-f78509c41a02, localDnId = 
6a750295-7e7c-4786-b28c-f78509c41a02, remoteDnId == localDnId[false] {code}


!image-2024-07-12-09-37-23-618.png!


> DeletedBlockLogImpl#onMessage Inter-process communication UUID inconsistency
> ----------------------------------------------------------------------------
>
>                 Key: HDDS-11121
>                 URL: https://issues.apache.org/jira/browse/HDDS-11121
>             Project: Apache Ozone
>          Issue Type: Bug
>          Components: SCM
>            Reporter: Shilun Fan
>            Assignee: Shilun Fan
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: image-2024-07-12-09-37-23-618.png
>
>
> Our Ozone cluster has recently encountered some issues with data deletion. We 
> found that the SCM was unable to automatically clean up the data in the 
> deletion queue, preventing the completion of the entire deletion process. 
> During our problem analysis, we discovered an issue with 
> {{{}DeletedBlockLogImpl#onMessage{}}}. The UUID transmitted from the DN via 
> RPC was not recognized by the SCM, resulting in an "Unknown Datanode" 
> exception. We attempted to fix this issue and made some progress.
> {code:java}
> 024-07-08 12:08:19,606 
> [scm2-EventQueue-DeleteBlockStatusForDeletedBlockLogImpl] WARN 
> org.apache.hadoop.hdds.scm.block.SCMDeletedBlockTransactionStatusManager$SCMDeleteBlocksCommandStatusManager:
>  Unknown Datanode: 9df75b64-d0e4-44ae-9bc0-9355371c8a5b Scm Command ID: 
> 1720041450931 report status PENDING
> 2024-07-08 12:08:19,606 
> [scm2-EventQueue-DeleteBlockStatusForDeletedBlockLogImpl] WARN 
> org.apache.hadoop.hdds.scm.block.SCMDeletedBlockTransactionStatusManager$SCMDeleteBlocksCommandStatusManager:
>  Unknown Datanode: 9df75b64-d0e4-44ae-9bc0-9355371c8a5b Scm Command ID: 
> 1719241427194 report status PENDING
> 2024-07-08 12:08:19,606 
> [scm2-EventQueue-DeleteBlockStatusForDeletedBlockLogImpl] WARN 
> org.apache.hadoop.hdds.scm.block.SCMDeletedBlockTransactionStatusManager$SCMDeleteBlocksCommandStatusManager:
>  Unknown Datanode: 9df75b64-d0e4-44ae-9bc0-9355371c8a5b Scm Command ID: 
> 1720041450931 report status PENDING
> 2024-07-08 12:08:19,606 
> [scm2-EventQueue-DeleteBlockStatusForDeletedBlockLogImpl] WARN 
> org.apache.hadoop.hdds.scm.block.SCMDeletedBlockTransactionStatusManager$SCMDeleteBlocksCommandStatusManager:
>  Unknown Datanode: 9df75b64-d0e4-44ae-9bc0-9355371c8a5b Scm Command ID: 
> 1719241427194 report status PENDING
> 2024-07-08 12:08:19,617 
> [scm2-EventQueue-DeleteBlockStatusForDeletedBlockLogImpl] WARN 
> org.apache.hadoop.hdds.scm.block.SCMDeletedBlockTransactionStatusManager$SCMDeleteBlocksCommandStatusManager:
>  Unknown Datanode: efadefd7-4d25-42fd-a6ef-fabd64c97d7f Scm Command ID: 
> 1720041450023 report status PENDING
> 2024-07-08 12:08:19,664 
> [scm2-EventQueue-DeleteBlockStatusForDeletedBlockLogImpl] WARN 
> org.apache.hadoop.hdds.scm.block.SCMDeletedBlockTransactionStatusManager$SCMDeleteBlocksCommandStatusManager:
>  Unknown Datanode: 0c4b82eb-3856-4984-9b0d-d9670089921b Scm Command ID: 
> 1720106401909 report status PENDING
> 2024-07-08 12:08:19,664 
> [scm2-EventQueue-DeleteBlockStatusForDeletedBlockLogImpl] WARN 
> org.apache.hadoop.hdds.scm.block.SCMDeletedBlockTransactionStatusManager$SCMDeleteBlocksCommandStatusManager:
>  Unknown Datanode: 0c4b82eb-3856-4984-9b0d-d9670089921b Scm Command ID: 
> 1719241427294 report status PENDING {code}
> {code:java}
> 2024-07-12 08:35:37,032 
> [scm3-EventQueue-DeleteBlockStatusForDeletedBlockLogImpl] DEBUG 
> org.apache.hadoop.hdds.scm.block.DeletedBlockLogImpl: remoteDnId = 
> 888a550f-c59c-4dde-ba3e-3dcf8f9593e0, localDnId = 
> 888a550f-c59c-4dde-ba3e-3dcf8f9593e0, remoteDnId == localDnId[false]
> 2024-07-12 08:35:37,032 
> [scm3-EventQueue-DeleteBlockStatusForDeletedBlockLogImpl] DEBUG 
> org.apache.hadoop.hdds.scm.block.DeletedBlockLogImpl: remoteDnId = 
> c7919796-18fa-4f00-af94-9b7ebc21a572, localDnId = 
> c7919796-18fa-4f00-af94-9b7ebc21a572, remoteDnId == localDnId[false]
> 2024-07-12 08:35:37,032 
> [scm3-EventQueue-DeleteBlockStatusForDeletedBlockLogImpl] DEBUG 
> org.apache.hadoop.hdds.scm.block.DeletedBlockLogImpl: remoteDnId = 
> 596cd6c8-ecc7-48da-8039-75fe59d65846, localDnId = 
> 596cd6c8-ecc7-48da-8039-75fe59d65846, remoteDnId == localDnId[false]
> 2024-07-12 08:35:37,033 
> [scm3-EventQueue-DeleteBlockStatusForDeletedBlockLogImpl] DEBUG 
> org.apache.hadoop.hdds.scm.block.DeletedBlockLogImpl: remoteDnId = 
> de559349-fd76-4a5a-9acb-007432ba1876, localDnId = 
> de559349-fd76-4a5a-9acb-007432ba1876, remoteDnId == localDnId[false]
> 2024-07-12 08:35:37,033 
> [scm3-EventQueue-DeleteBlockStatusForDeletedBlockLogImpl] DEBUG 
> org.apache.hadoop.hdds.scm.block.DeletedBlockLogImpl: remoteDnId = 
> 6a750295-7e7c-4786-b28c-f78509c41a02, localDnId = 
> 6a750295-7e7c-4786-b28c-f78509c41a02, remoteDnId == localDnId[false] {code}
> !image-2024-07-12-09-37-23-618.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to