[
https://issues.apache.org/jira/browse/HDDS-10985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17853173#comment-17853173
]
Stephen O'Donnell commented on HDDS-10985:
------------------------------------------
Aside from using 10-3, which we never tested, the chunksize is 2048M which is
not the default 1024. There is a chance that is causing some issue during the
checksum calculation.
Are you able to look at the key / block in question? How big is it? Is this
reproducible if you write another key of the same size and the same EC
replication scheme? Without steps to reproduce, it is going to be difficult to
find the issue which is causing this.
> EC Reconstruction failed because the size of currentChunks was not equal to
> checksumBlockDataChunks
> ---------------------------------------------------------------------------------------------------
>
> Key: HDDS-10985
> URL: https://issues.apache.org/jira/browse/HDDS-10985
> Project: Apache Ozone
> Issue Type: Bug
> Components: EC
> Reporter: LiMinyu
> Priority: Critical
>
> EC reconstruction failed with *java.lang.IllegalArgumentException: The chunk
> list has 9 entries, but the checksum chunks has 10 entries. They should be
> equal in size* exception. The DN had this problen when the EC data was
> reconstructed. And I found that this problem can occur whether the data block
> or the check block is missing.
> *EC Policy:* rs-10-3-2048k
> *DN.log:*
> {code:java}
> 2024-06-06 18:20:17,837 [ContainerReplicationThread-12] WARN
> org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinatorTask:
> FAILED reconstructECContainersCommand: containerID=876481,
> replication=rs-10-3-2048k, missingIndexes=[11], sources={1=5919f690
> -3871-45d2-b414-004292b3e2d3(10.175.134.153/10.175.134.153),
> 2=718b671b-66ae-46eb-96fb-71411da7849d(10.175.134.172/10.175.134.172),
> 3=e0ce60b3-75d5-4d00-bcb9-7781ef61e827(10.175.134.135/10.175.134.135),
> 4=e9871cb6-44b0-4f39-ac8d-b04122dbd439(10.175.134.201/10.175.134.201),
> 5=b9319384-2f73-4610-9e03-c6b67bbfab0b(10.175.134.217/10.175.134.217),
> 6=9a0f6ff9-0772-4a1d-828e-96d3be50778c(10.175.134.199/10.175.134.199),
> 7=8c0800ad-0026-4fdd-bd6e-6d866e166e49(10.175.137.25/10.175.137.25),
> 8=24628bc9-5d7b-4310-a21f-9a35e2634fb4(10.175.134.200/10.175.134.200),
> 9=c23a4a3c-183a-4baf-ada4-e30800faa907(10.175.134.219/10.175.134.219),
> 10=c02658fa-898a-4406-a778-87653c2723c2(10.175.137.27/10.175.137.27),
> 12=2a598049-6f33-4f18-a32a-f9d1f2ad399d(10.175.137.43/10.175.137.43),
> 13=70cfa62e-5a7c-489e-bdf3-5527f9bb1679(10.175.134.203/10.175.134.203)},
> targets={11=099a12a7-e276-4ce0-bb3d-d915879ba4d9(10.175.138.92/10.175.138.92)}
> after 316099 ms
> java.lang.IllegalArgumentException: The chunk list has 9 entries, but the
> checksum chunks has 10 entries. They should be equal in size.
> at
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:143)
> at
> org.apache.hadoop.hdds.scm.storage.ECBlockOutputStream.executePutBlock(ECBlockOutputStream.java:144)
> at
> org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinator.reconstructECBlockGroup(ECReconstructionCoordinator.java:340)
> at
> org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinator.reconstructECContainerGroup(ECReconstructionCoordinator.java:180)
> at
> org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinatorTask.runTask(ECReconstructionCoordinatorTask.java:68)
> at
> org.apache.hadoop.ozone.container.replication.ReplicationSupervisor$TaskRunner.run(ReplicationSupervisor.java:359)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:750) {code}
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]