[ 
https://issues.apache.org/jira/browse/HBASE-8449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13647905#comment-13647905
 ] 

stack commented on HBASE-8449:
------------------------------

Thinking on it, we could test recoverLease result.  If false, wait one second 
or two and then retry (for the case where primary node is up, just taking its 
time).  If it comes back false on second invocation, then we wait what we think 
is the hdfs-side read timeout, dfs.socket.timeout, 'public static int 
READ_TIMEOUT = 60 * 1000;' or some good portion of it and then leave the loop 
w/o redoing recoverLease.  The read will likely fail but we have retrying going 
on around it (and Jimmy justed improved it over in hbase-8314).

The amount of time to wait the second time should probably be configurable 
since no way for us to know the hdfs configs (Talking w/ Elliott, we should 
have the master ask the NN and then have it publish the important configs for 
regionservers to pick up in zk: TODO).  We can reuse the config added by 
hbase-8389 and default it to 60 seconds rather than the 4 it is currently set 
to.

In another issue, we'd add looking for isFileClosed and if it returns before 
the 60 seconds expires, stop waiting and retry recoverLease.
                
> Refactor recoverLease retries and pauses informed by findings over in 
> hbase-8389
> --------------------------------------------------------------------------------
>
>                 Key: HBASE-8449
>                 URL: https://issues.apache.org/jira/browse/HBASE-8449
>             Project: HBase
>          Issue Type: Bug
>          Components: Filesystem Integration
>    Affects Versions: 0.94.7, 0.95.0
>            Reporter: stack
>            Priority: Critical
>             Fix For: 0.95.1
>
>
> HBASE-8359 is an interesting issue that roams near and far.  This issue is 
> about making use of the findings handily summarized on the end of hbase-8359 
> which have it that trunk needs refactor around how it does its recoverLease 
> handling (and that the patch committed against HBASE-8359 is not what we want 
> going forward).
> This issue is about making a patch that adds a lag between recoverLease 
> invocations where the lag is related to dfs timeouts -- the hdfs-side dfs 
> timeout -- and optionally makes use of the isFileClosed API if it is 
> available (a facility that is not yet committed to a branch near you and 
> unlikely to be within your locality with a good while to come).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to