[ 
https://issues.apache.org/jira/browse/HDDS-9319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen O'Donnell resolved HDDS-9319.
-------------------------------------
    Resolution: Duplicate

> EC Reconstruction fails when chunk length is 0 bytes
> ----------------------------------------------------
>
>                 Key: HDDS-9319
>                 URL: https://issues.apache.org/jira/browse/HDDS-9319
>             Project: Apache Ozone
>          Issue Type: Bug
>          Components: EC
>            Reporter: Varsha Ravi
>            Assignee: Stephen O'Donnell
>            Priority: Major
>
> EC Offline reconstruction is failing with *java.io.IOException: Failed to get 
> chunkInfo[123]* exception. The DN log shows there are insufficient datanodes 
> to read the EC blocks even though there are 3 DNs up out of 5.
> This could be because the chunk length is zero bytes.
> *EC Policy:* rs-3-2-1024k
> *DN.log:* 
> {noformat}
> 2023-09-15 13:47:57,844 ERROR 
> [ec-reconstruct-reader-TID-0]-org.apache.hadoop.hdds.scm.XceiverClientGrpc: 
> Failed to execute command GetBlock on the pipeline Pipeline[ Id: 
> b3a4dcf4-916d-4a97-adde-daaf9785b237, Nodes: 
> b3a4dcf4-916d-4a97-adde-daaf9785b237, ReplicationConfig: STANDALONE/ONE, 
> State:CLOSED, leaderId:, CreationTimestamp2023-09-15T13:47:57.796097Z[UTC]].
> 2023-09-15 13:47:57,845 INFO 
> [ContainerReplicationThread-0]-org.apache.hadoop.ozone.client.io.ECBlockReconstructedStripeInputStream:
>  ECBlockReconstructedStripeInputStream{conID: 1012 locID: 
> 111677748019201071}@740ac7aa: error reading [2], marked as failed
> org.apache.hadoop.ozone.client.io.BadDataLocationException: 
> java.io.IOException: Failed to get chunkInfo[123]: len == 0
>       at 
> org.apache.hadoop.ozone.client.io.ECBlockReconstructedStripeInputStream.readIntoBuffer(ECBlockReconstructedStripeInputStream.java:633)
>       at 
> org.apache.hadoop.ozone.client.io.ECBlockReconstructedStripeInputStream.lambda$loadDataBuffersFromStream$2(ECBlockReconstructedStripeInputStream.java:566)
>       at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
>       at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>       at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>       at java.base/java.lang.Thread.run(Thread.java:834)
> Caused by: java.io.IOException: Failed to get chunkInfo[123]: len == 0
>       at 
> org.apache.hadoop.hdds.scm.storage.BlockInputStream.validate(BlockInputStream.java:278)
>       at 
> org.apache.hadoop.hdds.scm.storage.BlockInputStream.lambda$static$0(BlockInputStream.java:265)
>       at 
> org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommandWithRetry(XceiverClientGrpc.java:407)
>       at 
> org.apache.hadoop.hdds.scm.XceiverClientGrpc.lambda$sendCommandWithTraceIDAndRetry$0(XceiverClientGrpc.java:347)
>       at 
> org.apache.hadoop.hdds.tracing.TracingUtil.executeInSpan(TracingUtil.java:169)
>       at 
> org.apache.hadoop.hdds.tracing.TracingUtil.executeInNewSpan(TracingUtil.java:149)
>       at 
> org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommandWithTraceIDAndRetry(XceiverClientGrpc.java:342)
>       at 
> org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommand(XceiverClientGrpc.java:323)
>       at 
> org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.getBlock(ContainerProtocolCalls.java:208)
>       at 
> org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.lambda$getBlock$0(ContainerProtocolCalls.java:186)
>       at 
> org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.tryEachDatanode(ContainerProtocolCalls.java:146)
>       at 
> org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.getBlock(ContainerProtocolCalls.java:185)
>       at 
> org.apache.hadoop.hdds.scm.storage.BlockInputStream.getChunkInfos(BlockInputStream.java:255)
>       at 
> org.apache.hadoop.hdds.scm.storage.BlockInputStream.initialize(BlockInputStream.java:146)
>       at 
> org.apache.hadoop.hdds.scm.storage.BlockInputStream.readWithStrategy(BlockInputStream.java:308)
>       at 
> org.apache.hadoop.hdds.scm.storage.ExtendedInputStream.read(ExtendedInputStream.java:66)
>       at 
> org.apache.hadoop.ozone.client.io.ECBlockReconstructedStripeInputStream.readFromCurrentLocation(ECBlockReconstructedStripeInputStream.java:644)
>       at 
> org.apache.hadoop.ozone.client.io.ECBlockReconstructedStripeInputStream.readIntoBuffer(ECBlockReconstructedStripeInputStream.java:620)
>       ... 5 more
> 2023-09-15 13:47:57,864 WARN 
> [ContainerReplicationThread-0]-org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinator:
>  Exception while reconstructing the container 1012. Cleaning up all the 
> recovering containers in the reconstruction process.
> org.apache.hadoop.ozone.client.io.InsufficientLocationsException: There are 
> insufficient datanodes to read the EC block
>       at 
> org.apache.hadoop.ozone.client.io.ECBlockReconstructedStripeInputStream.init(ECBlockReconstructedStripeInputStream.java:224)
>       at 
> org.apache.hadoop.ozone.client.io.ECBlockReconstructedStripeInputStream.read(ECBlockReconstructedStripeInputStream.java:382)
>       at 
> org.apache.hadoop.ozone.client.io.ECBlockReconstructedStripeInputStream.recoverChunks(ECBlockReconstructedStripeInputStream.java:331)
>       at 
> org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinator.reconstructECBlockGroup(ECReconstructionCoordinator.java:288)
>       at 
> org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinator.reconstructECContainerGroup(ECReconstructionCoordinator.java:170)
>       at 
> org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinatorTask.runTask(ECReconstructionCoordinatorTask.java:68)
>       at 
> org.apache.hadoop.ozone.container.replication.ReplicationSupervisor$TaskRunner.run(ReplicationSupervisor.java:359)
>       at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>       at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>       at java.base/java.lang.Thread.run(Thread.java:834)
> 2023-09-15 13:47:57,916 INFO 
> [ChunkReader-5]-org.apache.hadoop.ozone.container.keyvalue.KeyValueContainer: 
> Moving container 
> /hadoop-ozone/datanode/data/hdds/CID-2e752843-8947-454d-992f-013b3824468b/current/containerDir1/1012
>  to state DELETED from state:RECOVERING
> 2023-09-15 13:47:57,920 WARN 
> [ContainerReplicationThread-0]-org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinatorTask:
>  FAILED reconstructECContainersCommand: containerID=1012, 
> replication=rs-3-2-1024k, missingIndexes=[1, 2], 
> sources={3=b3a4dcf4-916d-4a97-adde-daaf9785b237, 
> 4=4641d611-0a8b-4d23-b47a-d25d71403aaf, 
> 5=eb073807-4edc-4753-a64a-a323a317ea2f}, 
> targets={1=bc497718-d822-4592-b406-ef32d3173cd9, 
> 2=2f307ebf-3bf3-412c-aaac-675718da3beb} after 372 ms
> org.apache.hadoop.ozone.client.io.InsufficientLocationsException: There are 
> insufficient datanodes to read the EC block
>       at 
> org.apache.hadoop.ozone.client.io.ECBlockReconstructedStripeInputStream.init(ECBlockReconstructedStripeInputStream.java:224)
>       at 
> org.apache.hadoop.ozone.client.io.ECBlockReconstructedStripeInputStream.read(ECBlockReconstructedStripeInputStream.java:382)
>       at 
> org.apache.hadoop.ozone.client.io.ECBlockReconstructedStripeInputStream.recoverChunks(ECBlockReconstructedStripeInputStream.java:331)
>       at 
> org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinator.reconstructECBlockGroup(ECReconstructionCoordinator.java:288)
>       at 
> org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinator.reconstructECContainerGroup(ECReconstructionCoordinator.java:170)
>       at 
> org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinatorTask.runTask(ECReconstructionCoordinatorTask.java:68)
>       at 
> org.apache.hadoop.ozone.container.replication.ReplicationSupervisor$TaskRunner.run(ReplicationSupervisor.java:359)
>       at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>       at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>       at java.base/java.lang.Thread.run(Thread.java:834){noformat}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to