[ 
https://issues.apache.org/jira/browse/HBASE-8449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13665670#comment-13665670
 ] 

stack commented on HBASE-8449:
------------------------------

bq. Increase hbase.lease.recovery.timeout default to 15 minutes, i.e. more than 
a standard hdfs recovery.

I do not follow [~nkeywal] It is 15minutes at the moment (this is just a copy 
of what was there before).


bq. hbase.lease.recovery.dfs.timeout: it should not be less than 10s imho.

This is set to 61 seconds, what we think the time it will take the NN to 
timeout on the datanode (dfs.socket.timeout hopefully).

bq. ....it's as well that it seems that the NN seems not to like multiple calls 
to the recoverLease. 

Yes, the aim w/ this patch is to not kill an ongoing lease recovery.

Regards your proposal, it is built on a patch not yet committed to hdfs.  I am 
trying to get something done now so that I can make a 0.95.1 release (what we 
have currently will do the scenario you mocked up where you were calling the 
namenode every second).

bq. The master calls recover lease as a part of the distributed split. We can 
enhance it in an other jira to give higher priority to closed wals vs. wals 
being recovered.

Yeah, that would be a good TODO for later.  All of your proposal seems for 
later rather than now.

You +1 on what I have here [~nkeywal]?



                
> Refactor recoverLease retries and pauses informed by findings over in 
> hbase-8389
> --------------------------------------------------------------------------------
>
>                 Key: HBASE-8449
>                 URL: https://issues.apache.org/jira/browse/HBASE-8449
>             Project: HBase
>          Issue Type: Bug
>          Components: Filesystem Integration
>    Affects Versions: 0.94.7, 0.95.0
>            Reporter: stack
>            Assignee: stack
>            Priority: Critical
>             Fix For: 0.95.1
>
>         Attachments: 8449.txt, 8449v2.txt, 8449v3.txt, 8449v4.txt
>
>
> HBASE-8359 is an interesting issue that roams near and far.  This issue is 
> about making use of the findings handily summarized on the end of hbase-8359 
> which have it that trunk needs refactor around how it does its recoverLease 
> handling (and that the patch committed against HBASE-8359 is not what we want 
> going forward).
> This issue is about making a patch that adds a lag between recoverLease 
> invocations where the lag is related to dfs timeouts -- the hdfs-side dfs 
> timeout -- and optionally makes use of the isFileClosed API if it is 
> available (a facility that is not yet committed to a branch near you and 
> unlikely to be within your locality with a good while to come).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to