[ 
https://issues.apache.org/jira/browse/HDFS-14937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17238886#comment-17238886
 ] 

Ye Ni edited comment on HDFS-14937 at 11/25/20, 7:21 PM:
---------------------------------------------------------

[~xuzq_zander], [~xkrogen], [~vagarychen], could you share the background of 
this change, or the issue to address?

We recently found an issue, that when Observer is down (machine taken offline 
for maintenance), we will have InterruptedIOException, for example 
ConnectTimeoutException. However, the request sent to Observer is not failed 
over to the Active NN, we throw the exception directly. This leads to all the 
request fail. Please note, for new client after the Observer is down, this 
issue couldn't happen. Because new client will not treat this down machine as 
Observer, so this code path will not hit.

I believe in this case, we should continue to try the active.

This change is in OSS trunk and 3+, not in 2.10+, which I don't know is 
intentionally or not.

[~inigoiri]


was (Author: nickyye):
[~xuzq_zander], [~xkrogen], [~vagarychen], could you share the background of 
this change, or the issue to address?

We recently found a issue, that when Observer is down (machine taken offline 
for maintenance), we will have InterruptedIOException, for example 
ConnectTimeoutException. However, the request sent to Observer is not failed 
over to the Active NN, we throw the exception directly. This leads to all the 
request fail. Please note, for new client after the Observer is down, this 
issue couldn't happen. Because new client will not treat this down machine as 
Observer, so this code path will not hit.

I believe in this case, we should continue to try the active.

This change is in OSS trunk and 3+, not in 2.10+, which I don't know is 
intentionally or not.

[~inigoiri]

> [SBN read] ObserverReadProxyProvider should throw InterruptException
> --------------------------------------------------------------------
>
>                 Key: HDFS-14937
>                 URL: https://issues.apache.org/jira/browse/HDFS-14937
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: xuzq
>            Assignee: xuzq
>            Priority: Major
>             Fix For: 3.3.0
>
>         Attachments: HDFS-14937-trunk-001.patch, HDFS-14937-trunk-002.patch
>
>
> ObserverReadProxyProvider should throw InterruptException immediately if one 
> Observer catch InterruptException in invoking.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to