[
https://issues.apache.org/jira/browse/HDDS-12468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Siddhant Sangwan reassigned HDDS-12468:
---------------------------------------
Assignee: Siddhant Sangwan (was: Chu Cheng Li)
> Check for space availability for all dns while container creation in pipeline
> -----------------------------------------------------------------------------
>
> Key: HDDS-12468
> URL: https://issues.apache.org/jira/browse/HDDS-12468
> Project: Apache Ozone
> Issue Type: Sub-task
> Reporter: Sumit Agrawal
> Assignee: Siddhant Sangwan
> Priority: Major
>
> At SCM for Ratis during allocateBlock,
> # Pipeline is chosen randomly
> # Container is choosen round robin with size required
> # if matching container is not found
> ## Create a new container and return back
> # Block is assigned to the container and returned back response
>
> Later can can fail at DN while container creation with negative impact as
> below.
> Issue here is,
> * If Leader node in pipeline do not have capacity to create new container,
> it will return back container creation failure
> * If Follower node do not have capacity to create new container, it will
> fail and keep trying (if another follower is success)
> * This can have negative impact of disk getting full in parallel write
> blocks via state machine, and slow down write capability and failure response
>
> Its being observed that write on follower node getting stuck due to disk full
> / volume failure.
>
> As solution,
> * In this situation, SCM should trigger pipeline closure (including
> container closure) with cool down time
> * Should choose other pipeline for block allocation
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]