[ 
https://issues.apache.org/jira/browse/HDDS-10985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17854615#comment-17854615
 ] 

Shilun Fan commented on HDDS-10985:
-----------------------------------

We are using EC-6-3-1024K, but we encountered an EC data recovery issue. The 
error message is as follows:
{code:java}
java.lang.IllegalArgumentException: The chunk list has 2 entries, but the 
checksum chunks has 3 entries. They should be equal in size.  at   
com.google.common.base.Preconditions.checkArgument(Preconditions.java:143)  at 
org.apache.hadoop.hdds.scm.storage.ECBlockOutputStream.executePutBlock(ECBlockOutputStream.java:147)
  at 
org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinator.reconstructECBlockGroup(ECReconstructionCoordinator.java:338)
  at 
org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinator.reconstructECContainerGroup(ECReconstructionCoordinator.java:181)
  at 
org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinatorTask.runTask(ECReconstructionCoordinatorTask.java:68)
  at 
org.apache.hadoop.ozone.container.replication.ReplicationSupervisor$TaskRunner.run(ReplicationSupervisor.java:369)
  at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
 at java.lang.Thread.run(Thread.java:745) {code}
The error message from ECReconstructionCoordinator#logBlockGroupDetails is as 
follows:
{code:java}
2024-06-13 07:59:00,718 [ContainerReplicationThread-6] INFO 
org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinator:
 
Block Data for: conID: 1188979 locID: 113750155360100435 bcsId: 0 
replica Index: 1 block length: 3145728 block group length: 18874368 
chunk list:  chunkNum: 1 length: 1048576 offset: 0  chunkNum: 2 length: 1048576 
offset: 1048576  chunkNum: 3 length: 1048576 offset: 2097152

2024-06-13 07:59:00,718 [ContainerReplicationThread-6] INFO 
org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinator:
 
Block Data for: conID: 1188979 locID: 113750155360100435 bcsId: 0 replica 
Index: 2 block length: 3145728 block group length: 18874368 chunk list:  
chunkNum: 1 length: 1048576 offset: 0  chunkNum: 2 length: 1048576 offset: 
1048576  chunkNum: 3 length: 1048576 offset: 2097152

2024-06-13 07:59:00,718 [ContainerReplicationThread-6] INFO 
org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinator:
 
Block Data for: conID: 1188979 locID: 113750155360100435 bcsId: 0 replica 
Index: 3 block length: 3145728 block group length: 18874368 chunk list:  
chunkNum: 1 length: 1048576 offset: 0  chunkNum: 2 length: 1048576 offset: 
1048576  chunkNum: 3 length: 1048576 offset: 2097152

2024-06-13 07:59:00,718 [ContainerReplicationThread-6] INFO 
org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinator:
 
Block Data for: conID: 1188979 locID: 113750155360100435 bcsId: 0 replica 
Index: 4 block length: 2097152 block group length: 12582912 chunk list:  
chunkNum: 1 length: 1048576 offset: 0  chunkNum: 2 length: 1048576 offset: 
1048576

2024-06-13 07:59:00,718 [ContainerReplicationThread-6] INFO 
org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinator:
 
Block Data for: conID: 1188979 locID: 113750155360100435 bcsId: 0 replica 
Index: 5 block length: 3145728 block group length: 18874368 chunk list:  
chunkNum: 1 length: 1048576 offset: 0  chunkNum: 2 length: 1048576 offset: 
1048576  chunkNum: 3 length: 1048576 offset: 2097152

2024-06-13 07:59:00,718 [ContainerReplicationThread-6] INFO 
org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinator:
 
Block Data for: conID: 1188979 locID: 113750155360100435 bcsId: 0 replica 
Index: 6 block length: 3145728 block group length: 18874368 chunk list:  
chunkNum: 1 length: 1048576 offset: 0  chunkNum: 2 length: 1048576 offset: 
1048576  chunkNum: 3 length: 1048576 offset: 2097152

2024-06-13 07:59:00,718 [ContainerReplicationThread-6] INFO 
org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinator:
 
Block Data for: conID: 1188979 locID: 113750155360100435 bcsId: 0 replica 
Index: 8 block length: 3145728 block group length: 18874368 chunk list:  
chunkNum: 1 length: 1048576 offset: 0  chunkNum: 2 length: 1048576 offset: 
1048576  chunkNum: 3 length: 1048576 offset: 2097152

2024-06-13 07:59:00,718 [ContainerReplicationThread-6] INFO 
org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinator:
 
Block Data for: conID: 1188979 locID: 113750155360100435 bcsId: 0 replica 
Index: 9 block length: 3145728 block group length: 18874368 chunk list:  
chunkNum: 1 length: 1048576 offset: 0  chunkNum: 2 length: 1048576 offset: 
1048576  chunkNum: 3 length: 1048576 offset: 2097152 {code}

> EC Reconstruction failed because the size of currentChunks was not equal to 
> checksumBlockDataChunks
> ---------------------------------------------------------------------------------------------------
>
>                 Key: HDDS-10985
>                 URL: https://issues.apache.org/jira/browse/HDDS-10985
>             Project: Apache Ozone
>          Issue Type: Bug
>          Components: EC
>            Reporter: LiMinyu
>            Priority: Critical
>
> EC reconstruction failed with *java.lang.IllegalArgumentException: The chunk 
> list has 9 entries, but the checksum chunks has 10 entries. They should be 
> equal in size* exception. The DN had this problen when the EC data was 
> reconstructed. And I found that this problem can occur whether the data block 
> or the check block is missing.
> *EC Policy:* rs-10-3-2048k
> *DN.log:* 
> {code:java}
> 2024-06-06 18:20:17,837 [ContainerReplicationThread-12] WARN 
> org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinatorTask:
>  FAILED reconstructECContainersCommand: containerID=876481, 
> replication=rs-10-3-2048k, missingIndexes=[11], sources={1=5919f690
> -3871-45d2-b414-004292b3e2d3(10.175.134.153/10.175.134.153), 
> 2=718b671b-66ae-46eb-96fb-71411da7849d(10.175.134.172/10.175.134.172), 
> 3=e0ce60b3-75d5-4d00-bcb9-7781ef61e827(10.175.134.135/10.175.134.135), 
> 4=e9871cb6-44b0-4f39-ac8d-b04122dbd439(10.175.134.201/10.175.134.201), 
> 5=b9319384-2f73-4610-9e03-c6b67bbfab0b(10.175.134.217/10.175.134.217), 
> 6=9a0f6ff9-0772-4a1d-828e-96d3be50778c(10.175.134.199/10.175.134.199), 
> 7=8c0800ad-0026-4fdd-bd6e-6d866e166e49(10.175.137.25/10.175.137.25), 
> 8=24628bc9-5d7b-4310-a21f-9a35e2634fb4(10.175.134.200/10.175.134.200), 
> 9=c23a4a3c-183a-4baf-ada4-e30800faa907(10.175.134.219/10.175.134.219), 
> 10=c02658fa-898a-4406-a778-87653c2723c2(10.175.137.27/10.175.137.27), 
> 12=2a598049-6f33-4f18-a32a-f9d1f2ad399d(10.175.137.43/10.175.137.43), 
> 13=70cfa62e-5a7c-489e-bdf3-5527f9bb1679(10.175.134.203/10.175.134.203)}, 
> targets={11=099a12a7-e276-4ce0-bb3d-d915879ba4d9(10.175.138.92/10.175.138.92)}
>  after 316099 ms
> java.lang.IllegalArgumentException: The chunk list has 9 entries, but the 
> checksum chunks has 10 entries. They should be equal in size.
>         at 
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:143)
>         at 
> org.apache.hadoop.hdds.scm.storage.ECBlockOutputStream.executePutBlock(ECBlockOutputStream.java:144)
>         at 
> org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinator.reconstructECBlockGroup(ECReconstructionCoordinator.java:340)
>         at 
> org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinator.reconstructECContainerGroup(ECReconstructionCoordinator.java:180)
>         at 
> org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinatorTask.runTask(ECReconstructionCoordinatorTask.java:68)
>         at 
> org.apache.hadoop.ozone.container.replication.ReplicationSupervisor$TaskRunner.run(ReplicationSupervisor.java:359)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:750) {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to