[ 
https://issues.apache.org/jira/browse/HDDS-10985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17854615#comment-17854615
 ] 

Shilun Fan edited comment on HDDS-10985 at 6/13/24 5:57 AM:
------------------------------------------------------------

We are using EC-6-3-1024K, but we encountered an EC data recovery issue. The 
error message is as follows:
{code:java}
java.lang.IllegalArgumentException: The chunk list has 2 entries, but the 
checksum chunks has 3 entries. They should be equal in size.  at   
com.google.common.base.Preconditions.checkArgument(Preconditions.java:143)  at 
org.apache.hadoop.hdds.scm.storage.ECBlockOutputStream.executePutBlock(ECBlockOutputStream.java:147)
  at 
org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinator.reconstructECBlockGroup(ECReconstructionCoordinator.java:338)
  at 
org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinator.reconstructECContainerGroup(ECReconstructionCoordinator.java:181)
  at 
org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinatorTask.runTask(ECReconstructionCoordinatorTask.java:68)
  at 
org.apache.hadoop.ozone.container.replication.ReplicationSupervisor$TaskRunner.run(ReplicationSupervisor.java:369)
  at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
 at java.lang.Thread.run(Thread.java:745) {code}
The error message from ECReconstructionCoordinator#logBlockGroupDetails is as 
follows:
{code:java}
2024-06-13 07:59:00,718 [ContainerReplicationThread-6] INFO 
org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinator:
 
Block Data for: conID: 1188979 locID: 113750155360100435 bcsId: 0 
replica Index: 1 block length: 3145728 block group length: 18874368 
chunk list:  
chunkNum: 1 length: 1048576 offset: 0  
chunkNum: 2 length: 1048576 offset: 1048576  
chunkNum: 3 length: 1048576 offset: 2097152

2024-06-13 07:59:00,718 [ContainerReplicationThread-6] INFO 
org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinator:
 
Block Data for: conID: 1188979 locID: 113750155360100435 bcsId: 0 
replica Index: 2 block length: 3145728 block group length: 18874368 
chunk list:  
chunkNum: 1 length: 1048576 offset: 0  
chunkNum: 2 length: 1048576 offset: 1048576  
chunkNum: 3 length: 1048576 offset: 2097152

2024-06-13 07:59:00,718 [ContainerReplicationThread-6] INFO 
org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinator:
 
Block Data for: conID: 1188979 locID: 113750155360100435 bcsId: 0 
replica Index: 3 block length: 3145728 block group length: 18874368 
chunk list:  
chunkNum: 1 length: 1048576 offset: 0  
chunkNum: 2 length: 1048576 offset: 1048576  
chunkNum: 3 length: 1048576 offset: 2097152

2024-06-13 07:59:00,718 [ContainerReplicationThread-6] INFO 
org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinator:
 
Block Data for: conID: 1188979 locID: 113750155360100435 bcsId: 0 
replica Index: 4 block length: 2097152 block group length: 12582912 
chunk list:  
chunkNum: 1 length: 1048576 offset: 0  
chunkNum: 2 length: 1048576 offset: 1048576

2024-06-13 07:59:00,718 [ContainerReplicationThread-6] INFO 
org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinator:
 
Block Data for: conID: 1188979 locID: 113750155360100435 bcsId: 0 
replica Index: 5 block length: 3145728 block group length: 18874368 
chunk list:  
chunkNum: 1 length: 1048576 offset: 0  
chunkNum: 2 length: 1048576 offset: 1048576  
chunkNum: 3 length: 1048576 offset: 2097152

2024-06-13 07:59:00,718 [ContainerReplicationThread-6] INFO 
org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinator:
 
Block Data for: conID: 1188979 locID: 113750155360100435 bcsId: 0 
replica Index: 6 block length: 3145728 block group length: 18874368 
chunk list:  
chunkNum: 1 length: 1048576 offset: 0  
chunkNum: 2 length: 1048576 offset: 1048576  
chunkNum: 3 length: 1048576 offset: 2097152

2024-06-13 07:59:00,718 [ContainerReplicationThread-6] INFO 
org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinator:
 
Block Data for: conID: 1188979 locID: 113750155360100435 bcsId: 0 
replica Index: 8 block length: 3145728 block group length: 18874368 
chunk list:  
chunkNum: 1 length: 1048576 offset: 0  
chunkNum: 2 length: 1048576 offset: 1048576  
chunkNum: 3 length: 1048576 offset: 2097152

2024-06-13 07:59:00,718 [ContainerReplicationThread-6] INFO 
org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinator:
 
Block Data for: conID: 1188979 locID: 113750155360100435 bcsId: 0 
replica Index: 9 block length: 3145728 block group length: 18874368 
chunk list:  
chunkNum: 1 length: 1048576 offset: 0  
chunkNum: 2 length: 1048576 offset: 1048576  
chunkNum: 3 length: 1048576 offset: 2097152 {code}
>From the block data, we can see that we lost replica 7, so we need to recover 
>the EC block. However, we are unable to proceed with the recovery because the 
>chunk list size for replica index 4 is inaccurate.

We analyzed the code for the data write process and found that the chunk list 
on a specific DataNode might be inaccurate due to the order in which stripe 
data was written.

Data writing process overview:
{code:java}
OutputStream#write
  \-- OzoneOutputStream#write
       \-- ECKeyOutputStream#write
             \-- ECKeyOutputStream#handleWrite
                 |-- Every 3 chunks are filled, calculate the parity block, and 
then FlushDataNode             
              \-- ECKeyOutputStream#close
                 |-- If there are less than 3 chunks, write them in close, and 
then FlushDataNode{code}
1. ECKeyOutputStream#handleWrite uses blockingqueue. Stripe will be written to 
ecStripeQueue. Then a separate thread pool will take out Stripe and send it to 
the DN in the pipeline according to the chunk.

2. DN data writing uses an asynchronous method, which can enhance efficiency.
BlockOutputStream#writeChunkToContainer
{code:java}
CompletableFuture<ContainerCommandResponseProto> writeChunkToContainer(
      ChunkBuffer chunk) throws IOException {
  .....

    try {
      XceiverClientReply asyncReply = writeChunkAsync(xceiverClient, chunkInfo,
          blockID.get(), data, token, replicationIndex);
      CompletableFuture<ContainerProtos.ContainerCommandResponseProto>
          respFuture = asyncReply.getResponse();
      CompletableFuture<ContainerProtos.ContainerCommandResponseProto>
          validateFuture = respFuture.thenApplyAsync(e -> {
            try {
              validateResponse(e);
            } catch (IOException sce) {
              respFuture.completeExceptionally(sce);
            }
            return e;
          }, responseExecutor).exceptionally(e -> {
            String msg = "Failed to write chunk " + chunkInfo.getChunkName() +
                " into block " + blockID;
            LOG.debug("{}, exception: {}", msg, e.getLocalizedMessage());
            CompletionException ce = new CompletionException(msg, e);
            setIoException(ce);
            throw ce;
          });
     ....
      pipeline, chunkInfo.getLen());
      return validateFuture;
    } catch (IOException | ExecutionException e) {
      throw new IOException(EXCEPTION_MSG + e.toString(), e);
    } catch (InterruptedException ex) {
      Thread.currentThread().interrupt();
      handleInterruptedException(ex, false);
    }
    return null;
  }
{code}

For a specific DataNode, EC stripes are also written out in parallel. In a real 
production environment, it can be quite complex. The DataNode might receive 
data for EC stripe 3 first, followed by data for EC stripe 2. In this case, the 
data for EC stripe 2 will overwrite the data for EC stripe 3, leading to 
inaccuracies in the block group length and chunk list for a particular replica.

[~sodonnell] If you have time, could you please take a look at this? Thank you 
very much!



was (Author: slfan1989):
We are using EC-6-3-1024K, but we encountered an EC data recovery issue. The 
error message is as follows:
{code:java}
java.lang.IllegalArgumentException: The chunk list has 2 entries, but the 
checksum chunks has 3 entries. They should be equal in size.  at   
com.google.common.base.Preconditions.checkArgument(Preconditions.java:143)  at 
org.apache.hadoop.hdds.scm.storage.ECBlockOutputStream.executePutBlock(ECBlockOutputStream.java:147)
  at 
org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinator.reconstructECBlockGroup(ECReconstructionCoordinator.java:338)
  at 
org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinator.reconstructECContainerGroup(ECReconstructionCoordinator.java:181)
  at 
org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinatorTask.runTask(ECReconstructionCoordinatorTask.java:68)
  at 
org.apache.hadoop.ozone.container.replication.ReplicationSupervisor$TaskRunner.run(ReplicationSupervisor.java:369)
  at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
 at java.lang.Thread.run(Thread.java:745) {code}
The error message from ECReconstructionCoordinator#logBlockGroupDetails is as 
follows:
{code:java}
2024-06-13 07:59:00,718 [ContainerReplicationThread-6] INFO 
org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinator:
 
Block Data for: conID: 1188979 locID: 113750155360100435 bcsId: 0 
replica Index: 1 block length: 3145728 block group length: 18874368 
chunk list:  
chunkNum: 1 length: 1048576 offset: 0  
chunkNum: 2 length: 1048576 offset: 1048576  
chunkNum: 3 length: 1048576 offset: 2097152

2024-06-13 07:59:00,718 [ContainerReplicationThread-6] INFO 
org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinator:
 
Block Data for: conID: 1188979 locID: 113750155360100435 bcsId: 0 
replica Index: 2 block length: 3145728 block group length: 18874368 
chunk list:  
chunkNum: 1 length: 1048576 offset: 0  
chunkNum: 2 length: 1048576 offset: 1048576  
chunkNum: 3 length: 1048576 offset: 2097152

2024-06-13 07:59:00,718 [ContainerReplicationThread-6] INFO 
org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinator:
 
Block Data for: conID: 1188979 locID: 113750155360100435 bcsId: 0 
replica Index: 3 block length: 3145728 block group length: 18874368 
chunk list:  
chunkNum: 1 length: 1048576 offset: 0  
chunkNum: 2 length: 1048576 offset: 1048576  
chunkNum: 3 length: 1048576 offset: 2097152

2024-06-13 07:59:00,718 [ContainerReplicationThread-6] INFO 
org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinator:
 
Block Data for: conID: 1188979 locID: 113750155360100435 bcsId: 0 
replica Index: 4 block length: 2097152 block group length: 12582912 
chunk list:  
chunkNum: 1 length: 1048576 offset: 0  
chunkNum: 2 length: 1048576 offset: 1048576

2024-06-13 07:59:00,718 [ContainerReplicationThread-6] INFO 
org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinator:
 
Block Data for: conID: 1188979 locID: 113750155360100435 bcsId: 0 
replica Index: 5 block length: 3145728 block group length: 18874368 
chunk list:  
chunkNum: 1 length: 1048576 offset: 0  
chunkNum: 2 length: 1048576 offset: 1048576  
chunkNum: 3 length: 1048576 offset: 2097152

2024-06-13 07:59:00,718 [ContainerReplicationThread-6] INFO 
org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinator:
 
Block Data for: conID: 1188979 locID: 113750155360100435 bcsId: 0 
replica Index: 6 block length: 3145728 block group length: 18874368 
chunk list:  
chunkNum: 1 length: 1048576 offset: 0  
chunkNum: 2 length: 1048576 offset: 1048576  
chunkNum: 3 length: 1048576 offset: 2097152

2024-06-13 07:59:00,718 [ContainerReplicationThread-6] INFO 
org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinator:
 
Block Data for: conID: 1188979 locID: 113750155360100435 bcsId: 0 
replica Index: 8 block length: 3145728 block group length: 18874368 
chunk list:  
chunkNum: 1 length: 1048576 offset: 0  
chunkNum: 2 length: 1048576 offset: 1048576  
chunkNum: 3 length: 1048576 offset: 2097152

2024-06-13 07:59:00,718 [ContainerReplicationThread-6] INFO 
org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinator:
 
Block Data for: conID: 1188979 locID: 113750155360100435 bcsId: 0 
replica Index: 9 block length: 3145728 block group length: 18874368 
chunk list:  
chunkNum: 1 length: 1048576 offset: 0  
chunkNum: 2 length: 1048576 offset: 1048576  
chunkNum: 3 length: 1048576 offset: 2097152 {code}
>From the block data, we can see that we lost replica 7, so we need to recover 
>the EC block. However, we are unable to proceed with the recovery because the 
>chunk list size for replica index 4 is inaccurate.

We analyzed the code for the data write process and found that the chunk list 
on a specific DataNode might be inaccurate due to the order in which stripe 
data was written.

Data writing process overview:
{code:java}
OutputStream#write
  \-- OzoneOutputStream#write
       \-- ECKeyOutputStream#write
             \-- ECKeyOutputStream#handleWrite
                 |-- Every 3 chunks are filled, calculate the parity block, and 
then FlushDataNode             
              \-- ECKeyOutputStream#close
                 |-- If there are less than 3 chunks, write them in close, and 
then FlushDataNode{code}
1. ECKeyOutputStream#handleWrite uses blockingqueue. Stripe will be written to 
ecStripeQueue. Then a separate thread pool will take out Stripe and send it to 
the DN in the pipeline according to the chunk.

2. DN data writing uses an asynchronous method, which can enhance efficiency.
BlockOutputStream#writeChunkToContainer
{code:java}
CompletableFuture<ContainerCommandResponseProto> writeChunkToContainer(
      ChunkBuffer chunk) throws IOException {
  .....

    try {
      XceiverClientReply asyncReply = writeChunkAsync(xceiverClient, chunkInfo,
          blockID.get(), data, token, replicationIndex);
      CompletableFuture<ContainerProtos.ContainerCommandResponseProto>
          respFuture = asyncReply.getResponse();
      CompletableFuture<ContainerProtos.ContainerCommandResponseProto>
          validateFuture = respFuture.thenApplyAsync(e -> {
            try {
              validateResponse(e);
            } catch (IOException sce) {
              respFuture.completeExceptionally(sce);
            }
            return e;
          }, responseExecutor).exceptionally(e -> {
            String msg = "Failed to write chunk " + chunkInfo.getChunkName() +
                " into block " + blockID;
            LOG.debug("{}, exception: {}", msg, e.getLocalizedMessage());
            CompletionException ce = new CompletionException(msg, e);
            setIoException(ce);
            throw ce;
          });
     ....
      pipeline, chunkInfo.getLen());
      return validateFuture;
    } catch (IOException | ExecutionException e) {
      throw new IOException(EXCEPTION_MSG + e.toString(), e);
    } catch (InterruptedException ex) {
      Thread.currentThread().interrupt();
      handleInterruptedException(ex, false);
    }
    return null;
  }
{code}

For a specific DataNode, EC stripes are also written out in parallel. In a real 
production environment, it can be quite complex. The DataNode might receive 
data for EC stripe 3 first, followed by data for EC stripe 2. In this case, the 
data for EC stripe 2 will overwrite the data for EC stripe 3, leading to 
inaccuracies in the block group length and chunk list for a particular replica.



> EC Reconstruction failed because the size of currentChunks was not equal to 
> checksumBlockDataChunks
> ---------------------------------------------------------------------------------------------------
>
>                 Key: HDDS-10985
>                 URL: https://issues.apache.org/jira/browse/HDDS-10985
>             Project: Apache Ozone
>          Issue Type: Bug
>          Components: EC
>            Reporter: LiMinyu
>            Priority: Critical
>
> EC reconstruction failed with *java.lang.IllegalArgumentException: The chunk 
> list has 9 entries, but the checksum chunks has 10 entries. They should be 
> equal in size* exception. The DN had this problen when the EC data was 
> reconstructed. And I found that this problem can occur whether the data block 
> or the check block is missing.
> *EC Policy:* rs-10-3-2048k
> *DN.log:* 
> {code:java}
> 2024-06-06 18:20:17,837 [ContainerReplicationThread-12] WARN 
> org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinatorTask:
>  FAILED reconstructECContainersCommand: containerID=876481, 
> replication=rs-10-3-2048k, missingIndexes=[11], sources={1=5919f690
> -3871-45d2-b414-004292b3e2d3(10.175.134.153/10.175.134.153), 
> 2=718b671b-66ae-46eb-96fb-71411da7849d(10.175.134.172/10.175.134.172), 
> 3=e0ce60b3-75d5-4d00-bcb9-7781ef61e827(10.175.134.135/10.175.134.135), 
> 4=e9871cb6-44b0-4f39-ac8d-b04122dbd439(10.175.134.201/10.175.134.201), 
> 5=b9319384-2f73-4610-9e03-c6b67bbfab0b(10.175.134.217/10.175.134.217), 
> 6=9a0f6ff9-0772-4a1d-828e-96d3be50778c(10.175.134.199/10.175.134.199), 
> 7=8c0800ad-0026-4fdd-bd6e-6d866e166e49(10.175.137.25/10.175.137.25), 
> 8=24628bc9-5d7b-4310-a21f-9a35e2634fb4(10.175.134.200/10.175.134.200), 
> 9=c23a4a3c-183a-4baf-ada4-e30800faa907(10.175.134.219/10.175.134.219), 
> 10=c02658fa-898a-4406-a778-87653c2723c2(10.175.137.27/10.175.137.27), 
> 12=2a598049-6f33-4f18-a32a-f9d1f2ad399d(10.175.137.43/10.175.137.43), 
> 13=70cfa62e-5a7c-489e-bdf3-5527f9bb1679(10.175.134.203/10.175.134.203)}, 
> targets={11=099a12a7-e276-4ce0-bb3d-d915879ba4d9(10.175.138.92/10.175.138.92)}
>  after 316099 ms
> java.lang.IllegalArgumentException: The chunk list has 9 entries, but the 
> checksum chunks has 10 entries. They should be equal in size.
>         at 
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:143)
>         at 
> org.apache.hadoop.hdds.scm.storage.ECBlockOutputStream.executePutBlock(ECBlockOutputStream.java:144)
>         at 
> org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinator.reconstructECBlockGroup(ECReconstructionCoordinator.java:340)
>         at 
> org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinator.reconstructECContainerGroup(ECReconstructionCoordinator.java:180)
>         at 
> org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinatorTask.runTask(ECReconstructionCoordinatorTask.java:68)
>         at 
> org.apache.hadoop.ozone.container.replication.ReplicationSupervisor$TaskRunner.run(ReplicationSupervisor.java:359)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:750) {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to