[ 
https://issues.apache.org/jira/browse/HDDS-6342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-6342:
---------------------------------
    Labels: pull-request-available  (was: )

> EC: Fix large write with multiple stripes upon stripe failure.
> --------------------------------------------------------------
>
>                 Key: HDDS-6342
>                 URL: https://issues.apache.org/jira/browse/HDDS-6342
>             Project: Apache Ozone
>          Issue Type: Sub-task
>            Reporter: Mark Gui
>            Assignee: Mark Gui
>            Priority: Major
>              Labels: pull-request-available
>
> Test with ockg
> ./bin/ozone freon ockg -p test -n 50 -t 8 -s $((500*1024*1024)) --type=EC 
> --replication=rs-10-4-1024k
> {code:java}
> 2022-02-15 12:43:11,295 [pool-2-thread-7] ERROR freon.BaseFreonGenerator: 
> Error on executing task 46
> java.lang.IllegalArgumentException
>         at 
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:130)
>         at 
> org.apache.hadoop.ozone.client.io.BlockOutputStreamEntryPool.commitKey(BlockOutputStreamEntryPool.java:327)
>         at 
> org.apache.hadoop.ozone.client.io.ECKeyOutputStream.close(ECKeyOutputStream.java:536)
>         at 
> org.apache.hadoop.ozone.client.io.OzoneOutputStream.close(OzoneOutputStream.java:61)
>         at 
> org.apache.hadoop.ozone.freon.OzoneClientKeyGenerator.lambda$createKey$36(OzoneClientKeyGenerator.java:150)
>         at com.codahale.metrics.Timer.time(Timer.java:101)
>         at 
> org.apache.hadoop.ozone.freon.OzoneClientKeyGenerator.createKey(OzoneClientKeyGenerator.java:142)
>         at 
> org.apache.hadoop.ozone.freon.BaseFreonGenerator.tryNextTask(BaseFreonGenerator.java:183)
>         at 
> org.apache.hadoop.ozone.freon.BaseFreonGenerator.taskLoop(BaseFreonGenerator.java:163)
>         at 
> org.apache.hadoop.ozone.freon.BaseFreonGenerator.lambda$startTaskRunners$1(BaseFreonGenerator.java:146)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748) {code}
> This happens only when write happen failure during parity write and there are 
> > 1 already written stripes in the current block group.
> Upon this a new block group is picked for retrying the current stripe write, 
> and the current block group should rollback its current position, the bug 
> lies within the calculation of the acked length of the block group.
> Code references:
> {code:java}
> if (handleParityWrites(ecChunkSize, allocateBlockIfFull,
>     shouldClose) == StripeWriteStatus.FAILED) {
>   handleStripeFailure(numDataBlks * ecChunkSize, allocateBlockIfFull,
>       shouldClose);
> } else {
>   // At this stage stripe write is successful.
>   currentStreamEntry.updateBlockGroupToAckedPosition(
>       currentStreamEntry.getCurrentPosition());
> } {code}
> {code:java}
> private StripeWriteStatus rewriteStripeToNewBlockGroup(
>     int failedStripeDataSize, boolean allocateBlockIfFull, boolean close)
>     throws IOException {
>   long[] failedDataStripeChunkLens = new long[numDataBlks];
>   long[] failedParityStripeChunkLens = new long[numParityBlks];
>   final ByteBuffer[] dataBuffers = ecChunkBufferCache.getDataBuffers();
>   for (int i = 0; i < numDataBlks; i++) {
>     failedDataStripeChunkLens[i] = dataBuffers[i].limit();
>   }
>   final ByteBuffer[] parityBuffers = ecChunkBufferCache.getParityBuffers();
>   for (int i = 0; i < numParityBlks; i++) {
>     failedParityStripeChunkLens[i] = parityBuffers[i].limit();
>   }
>   blockOutputStreamEntryPool.getCurrentStreamEntry().resetToFirstEntry();
>   // Rollback the length/offset updated as part of this failed stripe write.
>   offset -= failedStripeDataSize;
>   blockOutputStreamEntryPool.getCurrentStreamEntry()
>       .resetToAckedPosition();                         <-- wrong position 
> deteced
> ...
> } {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to