[ 
https://issues.apache.org/jira/browse/HDDS-10682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen O'Donnell updated HDDS-10682:
-------------------------------------
    Fix Version/s: 1.4.1

> EC Reconstruction creates empty chunks at the end of blocks with partial 
> stripes
> --------------------------------------------------------------------------------
>
>                 Key: HDDS-10682
>                 URL: https://issues.apache.org/jira/browse/HDDS-10682
>             Project: Apache Ozone
>          Issue Type: Bug
>            Reporter: Stephen O'Donnell
>            Assignee: Stephen O'Donnell
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.5.0, 1.4.1
>
>
> Given an EC block that is larger than 1 full stripe, but the last stripe is 
> partial so that it does not use all the index.
> If one of the replicas is reconstructed that does not have any data in that 
> final position, an empty chunk is written to the end of the block's chunk 
> list.
> While this does no cause any immediate problem, it can prevent further 
> reconstructions that attempt to use this block, and they will fail with an 
> error like:
> {code}
> 2024-04-09 01:06:21,855 ERROR 
> [ec-reconstruct-reader-TID-4]-org.apache.hadoop.hdds.scm.XceiverClientGrpc: 
> Failed to execute command GetBlock on the pipeline Pipeline[ Id: 
> 7f6f1fc9-ed26-4e19-86b6-47435b027f6a, Nodes: 
> 7f6f1fc9-ed26-4e19-86b6-47435b027f6a(ccycloud-4.quasar-jyswng.root.comops.site/10.140.150.0),
>  ReplicationConfig: STANDALONE/ONE, State:CLOSED, leaderId:, 
> CreationTimestamp2024-04-09T01:06:21.724509Z[UTC]].
> 2024-04-09 01:06:21,859 INFO 
> [ContainerReplicationThread-1]-org.apache.hadoop.ozone.client.io.ECBlockReconstructedStripeInputStream:
>  ECBlockReconstructedStripeInputStream{conID: 10007 locID: 
> 113750153625610009}@756a3998: error reading [1], marked as failed
> org.apache.hadoop.ozone.client.io.BadDataLocationException: 
> java.io.IOException: Failed to get chunkInfo[77]: len == 0
>         at 
> org.apache.hadoop.ozone.client.io.ECBlockReconstructedStripeInputStream.readIntoBuffer(ECBlockReconstructedStripeInputStream.java:644)
>         at 
> org.apache.hadoop.ozone.client.io.ECBlockReconstructedStripeInputStream.lambda$loadDataBuffersFromStream$2(ECBlockReconstructedStripeInputStream.java:577)
>         at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
>         at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>         at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>         at java.base/java.lang.Thread.run(Thread.java:834)
> Caused by: java.io.IOException: Failed to get chunkInfo[77]: len == 0
>         at 
> org.apache.hadoop.hdds.scm.storage.BlockInputStream.validate(BlockInputStream.java:278)
>         at 
> org.apache.hadoop.hdds.scm.storage.BlockInputStream.lambda$static$0(BlockInputStream.java:265)
>         at 
> org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommandWithRetry(XceiverClientGrpc.java:407)
>         at 
> org.apache.hadoop.hdds.scm.XceiverClientGrpc.lambda$sendCommandWithTraceIDAndRetry$0(XceiverClientGrpc.java:347)
>         at 
> org.apache.hadoop.hdds.tracing.TracingUtil.executeInSpan(TracingUtil.java:169)
>         at 
> org.apache.hadoop.hdds.tracing.TracingUtil.executeInNewSpan(TracingUtil.java:149)
>         at 
> org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommandWithTraceIDAndRetry(XceiverClientGrpc.java:342)
>         at 
> org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommand(XceiverClientGrpc.java:323)
>         at 
> org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.getBlock(ContainerProtocolCalls.java:208)
>         at 
> org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.lambda$getBlock$0(ContainerProtocolCalls.java:186)
>         at 
> org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.tryEachDatanode(ContainerProtocolCalls.java:146)
>         at 
> org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.getBlock(ContainerProtocolCalls.java:185)
>         at 
> org.apache.hadoop.hdds.scm.storage.BlockInputStream.getChunkInfos(BlockInputStream.java:255)
>         at 
> org.apache.hadoop.hdds.scm.storage.BlockInputStream.initialize(BlockInputStream.java:146)
>         at 
> org.apache.hadoop.hdds.scm.storage.BlockInputStream.readWithStrategy(BlockInputStream.java:308)
>         at 
> org.apache.hadoop.hdds.scm.storage.ExtendedInputStream.read(ExtendedInputStream.java:66)
>         at 
> org.apache.hadoop.ozone.client.io.ECBlockReconstructedStripeInputStream.readFromCurrentLocation(ECBlockReconstructedStripeInputStream.java:655)
>         at 
> org.apache.hadoop.ozone.client.io.ECBlockReconstructedStripeInputStream.readIntoBuffer(ECBlockReconstructedStripeInputStream.java:631)
>         ... 5 more
> {code}
> If there are other spare replicas which can be used, reconstruction will 
> continue, otherwise it will not be able to complete.
> At this stage, I am not sure if this can affect reading a block via the 
> normal read path.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org

Reply via email to