[
https://issues.apache.org/jira/browse/HDDS-10985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17853882#comment-17853882
]
GuoHao commented on HDDS-10985:
-------------------------------
At present, when a block in the container fails to be repaired, the entire
container fails to be repaired. In the EC writing process, the success of the
entire stripe writing is controlled by the client. When a block fails to be
written, it may be due to the failure of several replica indexes. Before OM
triggers the open key cleanup, this block is in the wild. If it does not meet
the minimum number of blocks that the entire stripe can be repaired, it will
affect the repair of the EC.
The same situation may also exist when the progress of each replica index is
different when deleting a block.
cc [~sammichen] [~adoroszlai] [~sodonnell] [~siddhant]
> EC Reconstruction failed because the size of currentChunks was not equal to
> checksumBlockDataChunks
> ---------------------------------------------------------------------------------------------------
>
> Key: HDDS-10985
> URL: https://issues.apache.org/jira/browse/HDDS-10985
> Project: Apache Ozone
> Issue Type: Bug
> Components: EC
> Reporter: LiMinyu
> Priority: Critical
>
> EC reconstruction failed with *java.lang.IllegalArgumentException: The chunk
> list has 9 entries, but the checksum chunks has 10 entries. They should be
> equal in size* exception. The DN had this problen when the EC data was
> reconstructed. And I found that this problem can occur whether the data block
> or the check block is missing.
> *EC Policy:* rs-10-3-2048k
> *DN.log:*
> {code:java}
> 2024-06-06 18:20:17,837 [ContainerReplicationThread-12] WARN
> org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinatorTask:
> FAILED reconstructECContainersCommand: containerID=876481,
> replication=rs-10-3-2048k, missingIndexes=[11], sources={1=5919f690
> -3871-45d2-b414-004292b3e2d3(10.175.134.153/10.175.134.153),
> 2=718b671b-66ae-46eb-96fb-71411da7849d(10.175.134.172/10.175.134.172),
> 3=e0ce60b3-75d5-4d00-bcb9-7781ef61e827(10.175.134.135/10.175.134.135),
> 4=e9871cb6-44b0-4f39-ac8d-b04122dbd439(10.175.134.201/10.175.134.201),
> 5=b9319384-2f73-4610-9e03-c6b67bbfab0b(10.175.134.217/10.175.134.217),
> 6=9a0f6ff9-0772-4a1d-828e-96d3be50778c(10.175.134.199/10.175.134.199),
> 7=8c0800ad-0026-4fdd-bd6e-6d866e166e49(10.175.137.25/10.175.137.25),
> 8=24628bc9-5d7b-4310-a21f-9a35e2634fb4(10.175.134.200/10.175.134.200),
> 9=c23a4a3c-183a-4baf-ada4-e30800faa907(10.175.134.219/10.175.134.219),
> 10=c02658fa-898a-4406-a778-87653c2723c2(10.175.137.27/10.175.137.27),
> 12=2a598049-6f33-4f18-a32a-f9d1f2ad399d(10.175.137.43/10.175.137.43),
> 13=70cfa62e-5a7c-489e-bdf3-5527f9bb1679(10.175.134.203/10.175.134.203)},
> targets={11=099a12a7-e276-4ce0-bb3d-d915879ba4d9(10.175.138.92/10.175.138.92)}
> after 316099 ms
> java.lang.IllegalArgumentException: The chunk list has 9 entries, but the
> checksum chunks has 10 entries. They should be equal in size.
> at
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:143)
> at
> org.apache.hadoop.hdds.scm.storage.ECBlockOutputStream.executePutBlock(ECBlockOutputStream.java:144)
> at
> org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinator.reconstructECBlockGroup(ECReconstructionCoordinator.java:340)
> at
> org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinator.reconstructECContainerGroup(ECReconstructionCoordinator.java:180)
> at
> org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinatorTask.runTask(ECReconstructionCoordinatorTask.java:68)
> at
> org.apache.hadoop.ozone.container.replication.ReplicationSupervisor$TaskRunner.run(ReplicationSupervisor.java:359)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:750) {code}
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]