[
https://issues.apache.org/jira/browse/HDDS-9551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17790821#comment-17790821
]
Uma Maheswara Rao G edited comment on HDDS-9551 at 11/29/23 3:37 AM:
---------------------------------------------------------------------
So, does this retry count makes it fail early even if there are more nodes in
the cluster but they all involved in some restarts?
Let's say we have 100nodes cluster.
Assumptions: There are some random restarts happening in the cluster.
1. Client got a pipeline with 1,2,3.
2. It failed with node 3 and added into exclude list.
3. Got new pipeline with 1,2,4 and exclude list : 3 retry =1
4. Again client got failures due to 4. So, it got new pipeline with 1, 2,5. The
exclude list: 3, 4 retry =2
5. Again client got failure due to 5. So, it got new pipeline with 1,2, 6. and
The exclude list is: 3,4,5 and retry =3
6. Again client got failure due to 6. So, it got new pipeline with 1,2, 7. and
The exclude list is: 3,4,5,6 and retry =4
7. Again client got failure due to 7. So, it got new pipeline with 1,2, 8. and
The exclude list is: 3,4,5,6,7 and retry =5
8 Now client give as it already tried 5 times?
But we have so many good nodes( 1,2 and 9-100 ) in the cluster right?
Can you figure out what happens in this case?
cc: [~sumitagrawal] [~arp] [~erose]
was (Author: umamaheswararao):
So, does this retry count makes it fail early even if there are more nodes in
the cluster but they all involved in some restarts?
Let's say we have 100nodes cluster.
Assumptions: There are some random restarts happening in the cluster.
1. Client got a pipeline with 1,2,3.
2. It failed with node 3 and added into exclude list.
3. Got new pipeline with 1,2,4 and exclude list : 3 retry =1
4. Again client got failures due to 4. So, it got new pipeline with 1, 2,5. The
exclude list: 3, 4 retry =2
5. Again client got failure due to 5. So, it got new pipeline with 1,2, 6. and
The exclude list is: 3,4,5 and retry =3
6. Again client got failure due to 6. So, it got new pipeline with 1,2, 7. and
The exclude list is: 3,4,5,6 and retry =4
7. Again client got failure due to 7. So, it got new pipeline with 1,2, 8. and
The exclude list is: 3,4,5,6,7 and retry =5
8 Now client give as it already tried 5 times?
But we have so many good nodes( 1,2 and 9-100 ) in the cluster right?
Can you figure what happens in this case?
cc: [~sumitagrawal] [~arp] [~erose]
> Allow the client write to fall back to nodes in the exclude list if that is
> all that is available
> -------------------------------------------------------------------------------------------------
>
> Key: HDDS-9551
> URL: https://issues.apache.org/jira/browse/HDDS-9551
> Project: Apache Ozone
> Issue Type: Task
> Components: Ozone Client
> Reporter: Dave Teng
> Assignee: Dave Teng
> Priority: Major
> Labels: pull-request-available
>
> Allow the client write to fall back to nodes in the exclude list if that is
> all that is available
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]