[
https://issues.apache.org/jira/browse/HDDS-6373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mark Gui updated HDDS-6373:
---------------------------
Description:
Container close due to container full will make DN reply a
ContainerNotOpenException to the Client, but it doesn't mean that this DN is
failed and should be excluded for new block group allocation. Otherwise we may
get many HEALTHY DNs to be excluded and new block group may fail to be
allocated in a small cluster.
E.g.
45 DNs(docker simulated), ozone-site.xml:
<property>
<name>ozone.scm.container.size</name>
<value>256MB</value>
</property>
<property>
<name>ozone.scm.block.size</name>
<value>16MB</value>
</property>
test with Freon ockg:
./bin/ozone freon ockg --type=EC --replication=rs-10-4-1024k -p test -n 10 -t
10 -s $((4 * 1024 * 1024 * 1024))
would result in a 5-8 failures with HDDS-6364 patched.
But with this fix and HDDS-6364 together, it shows all 10 success for many
rounds.
was:
Container close due to container full will make DN reply a
ContainerNotOpenException to the Client, but it doesn't mean that this DN is
failed and should be excluded for new block group allocation. Otherwise we may
get many HEALTHY DNs to be excluded and new block group may fail to be
allocated in a small cluster.
E.g.
> EC: Exclude pipeline upon container close instead of exclude DNs.
> -----------------------------------------------------------------
>
> Key: HDDS-6373
> URL: https://issues.apache.org/jira/browse/HDDS-6373
> Project: Apache Ozone
> Issue Type: Sub-task
> Reporter: Mark Gui
> Assignee: Mark Gui
> Priority: Major
>
> Container close due to container full will make DN reply a
> ContainerNotOpenException to the Client, but it doesn't mean that this DN is
> failed and should be excluded for new block group allocation. Otherwise we
> may get many HEALTHY DNs to be excluded and new block group may fail to be
> allocated in a small cluster.
> E.g.
> 45 DNs(docker simulated), ozone-site.xml:
> <property>
> <name>ozone.scm.container.size</name>
> <value>256MB</value>
> </property>
> <property>
> <name>ozone.scm.block.size</name>
> <value>16MB</value>
> </property>
> test with Freon ockg:
> ./bin/ozone freon ockg --type=EC --replication=rs-10-4-1024k -p test -n 10 -t
> 10 -s $((4 * 1024 * 1024 * 1024))
> would result in a 5-8 failures with HDDS-6364 patched.
> But with this fix and HDDS-6364 together, it shows all 10 success for many
> rounds.
>
>
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]