[ 
https://issues.apache.org/jira/browse/HDFS-15112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17013824#comment-17013824
 ] 

Ayush Saxena commented on HDFS-15112:
-------------------------------------

In {{InvokeConcurrent}} there is a logic which requires to get response from 
all nameservices if {{requireResponse}} is true.

{code:java}
    for (final RemoteResult<T, R> result : results) {
      // Response from all servers required, use this error.
      if (requireResponse && result.hasException()) {
        throw result.getException();
      }
{code}

It is returning the same exception which it got from the namespace, In case the 
nameservice is down and {{invokeConcurrent}} call is made with 
{{requireResponse}} as true, it will be returning the same exception as 
received by the namenode. 

Maybe we can do the same here too, if it is one of {{isUnavailableException()}} 
we give that exception a priority rather than the first received. That way at 
the client level also, if the same exception was encountered by the client 
while connecting to the namenode, if he retried or did a failover, he can do 
that similarly here and we will be safe from concluding also that the file 
actually doesn't exist or not. By having a retry, we may land up with a 
response too, if the problem was temporary or with one router only.

Another solution could be having a new Exception for the pourpose, or maybe the 
same NoNamenodeException, but these won't be unwrapped at the client side they 
would be all RemoteException only.

Whatever fits your use case shall be fine with me, if none, let me know, I will 
try to come up with some other idea. :)

> RBF: do not return FileNotFoundException when a subcluster is unavailable 
> --------------------------------------------------------------------------
>
>                 Key: HDFS-15112
>                 URL: https://issues.apache.org/jira/browse/HDFS-15112
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Íñigo Goiri
>            Assignee: Íñigo Goiri
>            Priority: Major
>         Attachments: HDFS-15112.000.patch, HDFS-15112.patch
>
>
> If we have a mount point using HASH_ALL across two subclusters and one of 
> them is down, we may return FileNotFoundException while the file is just in 
> the unavailable subcluster.
> We should not return FileNotFoundException but something that shows that the 
> subcluster is unavailable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to