[ 
https://issues.apache.org/jira/browse/HDDS-9551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17790992#comment-17790992
 ] 

Sumit Agrawal commented on HDDS-9551:
-------------------------------------

In 7 node cluster,
 # Client got a pipeline with 1,2,3. 
 # added 1,2,3 as exclusion list
 # request new pipeline with exclusion (1,2,3) and get pipeline with 4,5,6, 
retry = 1
 # again failed and added to exclusion list(1,2,3,4,5,6), and available only 
node "7"
 # request new pipeline with exclusion (1,2,3,4,5,6), it will return one of 
pipeline from available pipeline from node (1-7) - {*}fallback mechanism 
available{*}, retry=2
 # step 5 continues till retry reaches "5" (default configured) with fallback 
mechanism, and then operation fails

 

There is one case of *failure immediate* (allocateBlock fails):

- if all {+}pipeline are not active{+}, it try to create a new pipeline, and if 
that do not get allocated within the timeframe, it will fail allocateBlock and 
client fails immediately. 

We have observed this failure log when client fails, where none-of-pipeline 
available and also create new pipeline fails due to node-unhealty.

 

If we have 100 nodes in cluster, chances of continuous failure is *likely rare* 
as pipeline is allocated on random basis. This problem is observed mostly on 
+small cluster.+  If SCM itself have all nodes un-healthy and new pipeline can 
not be created, there seems no way to handle same.

 

> Allow the client write to fall back to nodes in the exclude list if that is 
> all that is available
> -------------------------------------------------------------------------------------------------
>
>                 Key: HDDS-9551
>                 URL: https://issues.apache.org/jira/browse/HDDS-9551
>             Project: Apache Ozone
>          Issue Type: Task
>          Components: Ozone Client
>            Reporter: Dave Teng
>            Assignee: Dave Teng
>            Priority: Major
>              Labels: pull-request-available
>
> Allow the client write to fall back to nodes in the exclude list if that is 
> all that is available



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to