[
https://issues.apache.org/jira/browse/HDFS-2713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13174862#comment-13174862
]
Uma Maheswara Rao G commented on HDFS-2713:
-------------------------------------------
Thanks a lot, Aaron for the patience in review. :-)
{quote}
still don't see what benefit the background thread has. In the case you
describe, with the current implementation, the second client request (after the
failed one which had timed out retrying/failing over) would just simply
succeed, or fail over immediately and then succeed. So, the background thread
won't have saved much if any work, and instead may indefinitely be doing
(potentially unnecessary) work in the background.
{quote}
After any DFSClient operation fails due to Namenode unavailability, the most
important thing to do is to detect when the Active Namenode becomes available
again.
So the background thread is not doing any unnecessary work, it is doing the
high priority work.
The difference in our approaches is that, the importance given to the failover
till it succeeds.
In the approch I described, it is considered very important in the sense that
one thread is dedicated to find the Active Namenode and after finding only it
will exit.
If the RetryDecision is FAILOVER_AND_RETRY,then only the failover is done.
If there are many issued to Namenode whose RetryDecision is FAIL, failover
won't happen.
My intention is that, when one client call finds failover is required and not
able to complete the failover within the wait time, then why do I need to wait
till next call comes to try again and failover after mindealy wait?
Even though the first call fails, this background thread will ensure to find
the active proxy instance. If next call comes now(this is user thread), it
need not wait to connect and failover again. Immediately it can make use of
that proxy instance and goahead.
I will try to integrate the logic with *ConfiguredFailverProxyProvider* and
upload a temp patch for more understanding.
Thanks
Uma
> HA : An alternative approach to clients handling Namenode failover.
> --------------------------------------------------------------------
>
> Key: HDFS-2713
> URL: https://issues.apache.org/jira/browse/HDFS-2713
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: ha, hdfs client
> Affects Versions: HA branch (HDFS-1623)
> Reporter: Uma Maheswara Rao G
> Assignee: Uma Maheswara Rao G
>
> This is the approach for client failover which we adopted when we developed
> HA for Hadoop. I would like to propose thia approach for others to review &
> include in the HA implementation, if found useful.
> This is similar to the ConfiguredProxyProvider in the sense that the it takes
> the address of both the Namenodes as the input. The major differences I can
> see from the current implementation are
> 1) During failover, user threads can be controlled very accurately about *the
> time they wait for active namenode* to be available, awaiting the retry.
> Beyond this, the threads will not be made to wait; DFS Client will throw an
> Exception indicating that the operation has failed.
> 2) Failover happens in a seperate thread, not in the client application
> threads. The thread will keep trying to find the Active Namenode until it
> succeeds.
> 3) This also means that irrespective of whether the operation's RetryAction
> is RETRY_FAILOVER or FAIL, the user thread can trigger the client's failover.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira