Varsha Ravi created HDDS-9319:
---------------------------------

             Summary: EC Reconstruction fails when chunk length is 0 bytes
                 Key: HDDS-9319
                 URL: https://issues.apache.org/jira/browse/HDDS-9319
             Project: Apache Ozone
          Issue Type: Bug
          Components: EC
            Reporter: Varsha Ravi


EC Offline reconstruction is failing with *java.io.IOException: Failed to get 
chunkInfo[123]* exception. The DN log shows there are insufficient datanodes to 
read the EC blocks even though there are 3 DNs up out of 5.

This could be because the chunk length is zero bytes.

*EC Policy:* rs-3-2-1024k

*DN.log:* 
{noformat}
2023-09-15 13:47:57,844 ERROR 
[ec-reconstruct-reader-TID-0]-org.apache.hadoop.hdds.scm.XceiverClientGrpc: 
Failed to execute command GetBlock on the pipeline Pipeline[ Id: 
b3a4dcf4-916d-4a97-adde-daaf9785b237, Nodes: 
b3a4dcf4-916d-4a97-adde-daaf9785b237, ReplicationConfig: STANDALONE/ONE, 
State:CLOSED, leaderId:, CreationTimestamp2023-09-15T13:47:57.796097Z[UTC]].
2023-09-15 13:47:57,845 INFO 
[ContainerReplicationThread-0]-org.apache.hadoop.ozone.client.io.ECBlockReconstructedStripeInputStream:
 ECBlockReconstructedStripeInputStream{conID: 1012 locID: 
111677748019201071}@740ac7aa: error reading [2], marked as failed
org.apache.hadoop.ozone.client.io.BadDataLocationException: 
java.io.IOException: Failed to get chunkInfo[123]: len == 0
        at 
org.apache.hadoop.ozone.client.io.ECBlockReconstructedStripeInputStream.readIntoBuffer(ECBlockReconstructedStripeInputStream.java:633)
        at 
org.apache.hadoop.ozone.client.io.ECBlockReconstructedStripeInputStream.lambda$loadDataBuffersFromStream$2(ECBlockReconstructedStripeInputStream.java:566)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: java.io.IOException: Failed to get chunkInfo[123]: len == 0
        at 
org.apache.hadoop.hdds.scm.storage.BlockInputStream.validate(BlockInputStream.java:278)
        at 
org.apache.hadoop.hdds.scm.storage.BlockInputStream.lambda$static$0(BlockInputStream.java:265)
        at 
org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommandWithRetry(XceiverClientGrpc.java:407)
        at 
org.apache.hadoop.hdds.scm.XceiverClientGrpc.lambda$sendCommandWithTraceIDAndRetry$0(XceiverClientGrpc.java:347)
        at 
org.apache.hadoop.hdds.tracing.TracingUtil.executeInSpan(TracingUtil.java:169)
        at 
org.apache.hadoop.hdds.tracing.TracingUtil.executeInNewSpan(TracingUtil.java:149)
        at 
org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommandWithTraceIDAndRetry(XceiverClientGrpc.java:342)
        at 
org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommand(XceiverClientGrpc.java:323)
        at 
org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.getBlock(ContainerProtocolCalls.java:208)
        at 
org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.lambda$getBlock$0(ContainerProtocolCalls.java:186)
        at 
org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.tryEachDatanode(ContainerProtocolCalls.java:146)
        at 
org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.getBlock(ContainerProtocolCalls.java:185)
        at 
org.apache.hadoop.hdds.scm.storage.BlockInputStream.getChunkInfos(BlockInputStream.java:255)
        at 
org.apache.hadoop.hdds.scm.storage.BlockInputStream.initialize(BlockInputStream.java:146)
        at 
org.apache.hadoop.hdds.scm.storage.BlockInputStream.readWithStrategy(BlockInputStream.java:308)
        at 
org.apache.hadoop.hdds.scm.storage.ExtendedInputStream.read(ExtendedInputStream.java:66)
        at 
org.apache.hadoop.ozone.client.io.ECBlockReconstructedStripeInputStream.readFromCurrentLocation(ECBlockReconstructedStripeInputStream.java:644)
        at 
org.apache.hadoop.ozone.client.io.ECBlockReconstructedStripeInputStream.readIntoBuffer(ECBlockReconstructedStripeInputStream.java:620)
        ... 5 more
2023-09-15 13:47:57,864 WARN 
[ContainerReplicationThread-0]-org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinator:
 Exception while reconstructing the container 1012. Cleaning up all the 
recovering containers in the reconstruction process.
org.apache.hadoop.ozone.client.io.InsufficientLocationsException: There are 
insufficient datanodes to read the EC block
        at 
org.apache.hadoop.ozone.client.io.ECBlockReconstructedStripeInputStream.init(ECBlockReconstructedStripeInputStream.java:224)
        at 
org.apache.hadoop.ozone.client.io.ECBlockReconstructedStripeInputStream.read(ECBlockReconstructedStripeInputStream.java:382)
        at 
org.apache.hadoop.ozone.client.io.ECBlockReconstructedStripeInputStream.recoverChunks(ECBlockReconstructedStripeInputStream.java:331)
        at 
org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinator.reconstructECBlockGroup(ECReconstructionCoordinator.java:288)
        at 
org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinator.reconstructECContainerGroup(ECReconstructionCoordinator.java:170)
        at 
org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinatorTask.runTask(ECReconstructionCoordinatorTask.java:68)
        at 
org.apache.hadoop.ozone.container.replication.ReplicationSupervisor$TaskRunner.run(ReplicationSupervisor.java:359)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:834)
2023-09-15 13:47:57,916 INFO 
[ChunkReader-5]-org.apache.hadoop.ozone.container.keyvalue.KeyValueContainer: 
Moving container 
/hadoop-ozone/datanode/data/hdds/CID-2e752843-8947-454d-992f-013b3824468b/current/containerDir1/1012
 to state DELETED from state:RECOVERING
2023-09-15 13:47:57,920 WARN 
[ContainerReplicationThread-0]-org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinatorTask:
 FAILED reconstructECContainersCommand: containerID=1012, 
replication=rs-3-2-1024k, missingIndexes=[1, 2], 
sources={3=b3a4dcf4-916d-4a97-adde-daaf9785b237, 
4=4641d611-0a8b-4d23-b47a-d25d71403aaf, 
5=eb073807-4edc-4753-a64a-a323a317ea2f}, 
targets={1=bc497718-d822-4592-b406-ef32d3173cd9, 
2=2f307ebf-3bf3-412c-aaac-675718da3beb} after 372 ms
org.apache.hadoop.ozone.client.io.InsufficientLocationsException: There are 
insufficient datanodes to read the EC block
        at 
org.apache.hadoop.ozone.client.io.ECBlockReconstructedStripeInputStream.init(ECBlockReconstructedStripeInputStream.java:224)
        at 
org.apache.hadoop.ozone.client.io.ECBlockReconstructedStripeInputStream.read(ECBlockReconstructedStripeInputStream.java:382)
        at 
org.apache.hadoop.ozone.client.io.ECBlockReconstructedStripeInputStream.recoverChunks(ECBlockReconstructedStripeInputStream.java:331)
        at 
org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinator.reconstructECBlockGroup(ECReconstructionCoordinator.java:288)
        at 
org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinator.reconstructECContainerGroup(ECReconstructionCoordinator.java:170)
        at 
org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinatorTask.runTask(ECReconstructionCoordinatorTask.java:68)
        at 
org.apache.hadoop.ozone.container.replication.ReplicationSupervisor$TaskRunner.run(ReplicationSupervisor.java:359)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:834){noformat}
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to