[ 
https://issues.apache.org/jira/browse/HDDS-6373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G resolved HDDS-6373.
---------------------------------------
    Fix Version/s: EC-Branch
       Resolution: Fixed

> EC: Exclude pipeline upon container close instead of exclude DNs.
> -----------------------------------------------------------------
>
>                 Key: HDDS-6373
>                 URL: https://issues.apache.org/jira/browse/HDDS-6373
>             Project: Apache Ozone
>          Issue Type: Sub-task
>            Reporter: Mark Gui
>            Assignee: Mark Gui
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: EC-Branch
>
>
> Container close due to container full will make DN reply a 
> ContainerNotOpenException to the Client, but it doesn't mean that this DN is 
> failed and should be excluded for new block group allocation. Otherwise we 
> may get many HEALTHY DNs to be excluded and new block group may fail to be 
> allocated in a small cluster.
> E.g.
> 45 DNs(docker simulated), ozone-site.xml: 
>   <property>
>     <name>ozone.scm.container.size</name>
>     <value>256MB</value>
>   </property>
>   <property>
>     <name>ozone.scm.block.size</name>
>     <value>16MB</value>
>   </property>
> test with Freon ockg:
> ./bin/ozone freon ockg --type=EC --replication=rs-10-4-1024k -p test -n 10 -t 
> 10 -s $((4 * 1024 * 1024 * 1024))
> would result in a 5-8 failures with HDDS-6364 patched.
> {code:java}
> INTERNAL_ERROR org.apache.hadoop.ozone.om.exceptions.OMException: Allocated 0 
> blocks. Requested 1 blocks
>         at 
> org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.handleError(OzoneManagerProtocolClientSideTranslatorPB.java:660)
>         at 
> org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.allocateBlock(OzoneManagerProtocolClientSideTranslatorPB.java:695)
>         at 
> org.apache.hadoop.ozone.client.io.BlockOutputStreamEntryPool.allocateNewBlock(BlockOutputStreamEntryPool.java:309)
>         at 
> org.apache.hadoop.ozone.client.io.BlockOutputStreamEntryPool.allocateBlockIfNeeded(BlockOutputStreamEntryPool.java:371)
>         at 
> org.apache.hadoop.ozone.client.io.ECKeyOutputStream.rewriteStripeToNewBlockGroup(ECKeyOutputStream.java:244)
>         at 
> org.apache.hadoop.ozone.client.io.ECKeyOutputStream.handleStripeFailure(ECKeyOutputStream.java:586)
>         at 
> org.apache.hadoop.ozone.client.io.ECKeyOutputStream.checkAndWriteParityCells(ECKeyOutputStream.java:306)
>         at 
> org.apache.hadoop.ozone.client.io.ECKeyOutputStream.write(ECKeyOutputStream.java:192)
>         at 
> org.apache.hadoop.ozone.client.io.OzoneOutputStream.write(OzoneOutputStream.java:50)
>         at 
> org.apache.hadoop.ozone.freon.ContentGenerator.write(ContentGenerator.java:76)
>         at 
> org.apache.hadoop.ozone.freon.OzoneClientKeyGenerator.lambda$createKey$36(OzoneClientKeyGenerator.java:146)
>         at com.codahale.metrics.Timer.time(Timer.java:101)
>         at 
> org.apache.hadoop.ozone.freon.OzoneClientKeyGenerator.createKey(OzoneClientKeyGenerator.java:143)
>         at 
> org.apache.hadoop.ozone.freon.BaseFreonGenerator.tryNextTask(BaseFreonGenerator.java:183)
>         at 
> org.apache.hadoop.ozone.freon.BaseFreonGenerator.taskLoop(BaseFreonGenerator.java:163)
>         at 
> org.apache.hadoop.ozone.freon.BaseFreonGenerator.lambda$startTaskRunners$1(BaseFreonGenerator.java:146)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
>         Suppressed: java.lang.IllegalArgumentException: Expected writeOffset= 
> 1069543424 Expected offset=1059061760
>                 at 
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:144)
>                 at 
> org.apache.hadoop.ozone.client.io.ECKeyOutputStream.close(ECKeyOutputStream.java:564)
>                 at 
> org.apache.hadoop.ozone.client.io.OzoneOutputStream.close(OzoneOutputStream.java:61)
>                 at 
> org.apache.hadoop.ozone.freon.OzoneClientKeyGenerator.lambda$createKey$36(OzoneClientKeyGenerator.java:151)
>                 ... 8 more
> One ore more freon test is failed.
> 2022-02-24 08:41:44,272 [shutdown-hook-0] INFO metrics: type=TIMER, 
> name=key-create, count=10, min=313491.661668, max=577254.304029, 
> mean=563762.9508485134, stddev=44787.24799551536, median=575542.093982, 
> p75=577254.304029, p95=577254.304029, p98=577254.304029, p99=577254.304029, 
> p999=577254.304029, mean_rate=0.017322637056902915, m1=0.029562618662863496, 
> m5=0.014855802773079099, m15=0.007191674083204336, rate_unit=events/second, 
> duration_unit=milliseconds
> 2022-02-24 08:41:44,273 [shutdown-hook-0] INFO freon.BaseFreonGenerator: 
> Total execution time (sec): 578
> 2022-02-24 08:41:44,273 [shutdown-hook-0] INFO freon.BaseFreonGenerator: 
> Failures: 6
> 2022-02-24 08:41:44,273 [shutdown-hook-0] INFO freon.BaseFreonGenerator: 
> Successful executions: 4 {code}
> But with this fix and HDDS-6364 together, it shows all 10 success for many 
> rounds.
> {code:java}
> 2022-02-24 10:56:45,013 [Thread-4] INFO freon.ProgressBar: Progress: 90.00 % 
> (9 out of 10)
> 2022-02-24 10:56:46,013 [Thread-4] INFO freon.ProgressBar: Progress: 100.00 % 
> (10 out of 10)
> 2022-02-24 10:56:46,257 [shutdown-hook-0] INFO metrics: type=TIMER, 
> name=key-create, count=10, min=958022.893372, max=1038271.448129, 
> mean=1018238.201558835, stddev=22083.604143242464, median=1029968.020144, 
> p75=1034239.403617, p95=1038271.448129, p98=1038271.448129, 
> p99=1038271.448129, p999=1038271.448129, mean_rate=0.009623163938983789, 
> m1=0.09995782091693355, m5=0.02731461121892791, m15=0.009684867189776935, 
> rate_unit=events/second, duration_unit=milliseconds
> 2022-02-24 10:56:46,258 [shutdown-hook-0] INFO freon.BaseFreonGenerator: 
> Total execution time (sec): 1040
> 2022-02-24 10:56:46,258 [shutdown-hook-0] INFO freon.BaseFreonGenerator: 
> Failures: 0
> 2022-02-24 10:56:46,258 [shutdown-hook-0] INFO freon.BaseFreonGenerator: 
> Successful executions: 10 {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to