[
https://issues.apache.org/jira/browse/HDDS-10985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17854615#comment-17854615
]
Shilun Fan commented on HDDS-10985:
-----------------------------------
We are using EC-6-3-1024K, but we encountered an EC data recovery issue. The
error message is as follows:
{code:java}
java.lang.IllegalArgumentException: The chunk list has 2 entries, but the
checksum chunks has 3 entries. They should be equal in size. at
com.google.common.base.Preconditions.checkArgument(Preconditions.java:143) at
org.apache.hadoop.hdds.scm.storage.ECBlockOutputStream.executePutBlock(ECBlockOutputStream.java:147)
at
org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinator.reconstructECBlockGroup(ECReconstructionCoordinator.java:338)
at
org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinator.reconstructECContainerGroup(ECReconstructionCoordinator.java:181)
at
org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinatorTask.runTask(ECReconstructionCoordinatorTask.java:68)
at
org.apache.hadoop.ozone.container.replication.ReplicationSupervisor$TaskRunner.run(ReplicationSupervisor.java:369)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745) {code}
The error message from ECReconstructionCoordinator#logBlockGroupDetails is as
follows:
{code:java}
2024-06-13 07:59:00,718 [ContainerReplicationThread-6] INFO
org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinator:
Block Data for: conID: 1188979 locID: 113750155360100435 bcsId: 0
replica Index: 1 block length: 3145728 block group length: 18874368
chunk list: chunkNum: 1 length: 1048576 offset: 0 chunkNum: 2 length: 1048576
offset: 1048576 chunkNum: 3 length: 1048576 offset: 2097152
2024-06-13 07:59:00,718 [ContainerReplicationThread-6] INFO
org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinator:
Block Data for: conID: 1188979 locID: 113750155360100435 bcsId: 0 replica
Index: 2 block length: 3145728 block group length: 18874368 chunk list:
chunkNum: 1 length: 1048576 offset: 0 chunkNum: 2 length: 1048576 offset:
1048576 chunkNum: 3 length: 1048576 offset: 2097152
2024-06-13 07:59:00,718 [ContainerReplicationThread-6] INFO
org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinator:
Block Data for: conID: 1188979 locID: 113750155360100435 bcsId: 0 replica
Index: 3 block length: 3145728 block group length: 18874368 chunk list:
chunkNum: 1 length: 1048576 offset: 0 chunkNum: 2 length: 1048576 offset:
1048576 chunkNum: 3 length: 1048576 offset: 2097152
2024-06-13 07:59:00,718 [ContainerReplicationThread-6] INFO
org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinator:
Block Data for: conID: 1188979 locID: 113750155360100435 bcsId: 0 replica
Index: 4 block length: 2097152 block group length: 12582912 chunk list:
chunkNum: 1 length: 1048576 offset: 0 chunkNum: 2 length: 1048576 offset:
1048576
2024-06-13 07:59:00,718 [ContainerReplicationThread-6] INFO
org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinator:
Block Data for: conID: 1188979 locID: 113750155360100435 bcsId: 0 replica
Index: 5 block length: 3145728 block group length: 18874368 chunk list:
chunkNum: 1 length: 1048576 offset: 0 chunkNum: 2 length: 1048576 offset:
1048576 chunkNum: 3 length: 1048576 offset: 2097152
2024-06-13 07:59:00,718 [ContainerReplicationThread-6] INFO
org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinator:
Block Data for: conID: 1188979 locID: 113750155360100435 bcsId: 0 replica
Index: 6 block length: 3145728 block group length: 18874368 chunk list:
chunkNum: 1 length: 1048576 offset: 0 chunkNum: 2 length: 1048576 offset:
1048576 chunkNum: 3 length: 1048576 offset: 2097152
2024-06-13 07:59:00,718 [ContainerReplicationThread-6] INFO
org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinator:
Block Data for: conID: 1188979 locID: 113750155360100435 bcsId: 0 replica
Index: 8 block length: 3145728 block group length: 18874368 chunk list:
chunkNum: 1 length: 1048576 offset: 0 chunkNum: 2 length: 1048576 offset:
1048576 chunkNum: 3 length: 1048576 offset: 2097152
2024-06-13 07:59:00,718 [ContainerReplicationThread-6] INFO
org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinator:
Block Data for: conID: 1188979 locID: 113750155360100435 bcsId: 0 replica
Index: 9 block length: 3145728 block group length: 18874368 chunk list:
chunkNum: 1 length: 1048576 offset: 0 chunkNum: 2 length: 1048576 offset:
1048576 chunkNum: 3 length: 1048576 offset: 2097152 {code}
> EC Reconstruction failed because the size of currentChunks was not equal to
> checksumBlockDataChunks
> ---------------------------------------------------------------------------------------------------
>
> Key: HDDS-10985
> URL: https://issues.apache.org/jira/browse/HDDS-10985
> Project: Apache Ozone
> Issue Type: Bug
> Components: EC
> Reporter: LiMinyu
> Priority: Critical
>
> EC reconstruction failed with *java.lang.IllegalArgumentException: The chunk
> list has 9 entries, but the checksum chunks has 10 entries. They should be
> equal in size* exception. The DN had this problen when the EC data was
> reconstructed. And I found that this problem can occur whether the data block
> or the check block is missing.
> *EC Policy:* rs-10-3-2048k
> *DN.log:*
> {code:java}
> 2024-06-06 18:20:17,837 [ContainerReplicationThread-12] WARN
> org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinatorTask:
> FAILED reconstructECContainersCommand: containerID=876481,
> replication=rs-10-3-2048k, missingIndexes=[11], sources={1=5919f690
> -3871-45d2-b414-004292b3e2d3(10.175.134.153/10.175.134.153),
> 2=718b671b-66ae-46eb-96fb-71411da7849d(10.175.134.172/10.175.134.172),
> 3=e0ce60b3-75d5-4d00-bcb9-7781ef61e827(10.175.134.135/10.175.134.135),
> 4=e9871cb6-44b0-4f39-ac8d-b04122dbd439(10.175.134.201/10.175.134.201),
> 5=b9319384-2f73-4610-9e03-c6b67bbfab0b(10.175.134.217/10.175.134.217),
> 6=9a0f6ff9-0772-4a1d-828e-96d3be50778c(10.175.134.199/10.175.134.199),
> 7=8c0800ad-0026-4fdd-bd6e-6d866e166e49(10.175.137.25/10.175.137.25),
> 8=24628bc9-5d7b-4310-a21f-9a35e2634fb4(10.175.134.200/10.175.134.200),
> 9=c23a4a3c-183a-4baf-ada4-e30800faa907(10.175.134.219/10.175.134.219),
> 10=c02658fa-898a-4406-a778-87653c2723c2(10.175.137.27/10.175.137.27),
> 12=2a598049-6f33-4f18-a32a-f9d1f2ad399d(10.175.137.43/10.175.137.43),
> 13=70cfa62e-5a7c-489e-bdf3-5527f9bb1679(10.175.134.203/10.175.134.203)},
> targets={11=099a12a7-e276-4ce0-bb3d-d915879ba4d9(10.175.138.92/10.175.138.92)}
> after 316099 ms
> java.lang.IllegalArgumentException: The chunk list has 9 entries, but the
> checksum chunks has 10 entries. They should be equal in size.
> at
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:143)
> at
> org.apache.hadoop.hdds.scm.storage.ECBlockOutputStream.executePutBlock(ECBlockOutputStream.java:144)
> at
> org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinator.reconstructECBlockGroup(ECReconstructionCoordinator.java:340)
> at
> org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinator.reconstructECContainerGroup(ECReconstructionCoordinator.java:180)
> at
> org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinatorTask.runTask(ECReconstructionCoordinatorTask.java:68)
> at
> org.apache.hadoop.ozone.container.replication.ReplicationSupervisor$TaskRunner.run(ReplicationSupervisor.java:359)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:750) {code}
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]