[ 
https://issues.apache.org/jira/browse/HBASE-8776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13691053#comment-13691053
 ] 

Nicolas Liochon commented on HBASE-8776:
----------------------------------------

This 2 lines patch touches many different topics :-).
1) On ZooKeeper: any default between 30s and 90s is fine imho. Less can become 
an issue for some environments. More is a little bit ridiculous.
2) "by default we should be able to ride over a RS crash": I really think it's 
mandatory. I'm currently running tests on AWS. So far my stats say that a given 
machine will disappear for 5 minutes once per week. We must handle that well.
2.1) We can have rack wide failure as well. A rack hardware will need around 5 
minutes to recover. We must support that too imho (at least in our timeouts, we 
would have hard time recovering such a failure today).
3) cluster wide Fail fast vs. retry. I personally think that HBase contract is 
'any operation will eventually succeed', so I'm ok with more retries and longer 
timeouts, allowing to manage multiple failures in a row. So 40 minutes is fine. 
4) The final backoff time or 128 seconds seems huge to me, but I'm not against 
it.

So I'm totally +1 for the HBASE-8723 patch.
Then for 0.94... I think we could just do it, change all the settings like this 
one (i.e. zk timeout to 90s as trunk), and do a nice release notes. If we do 
that plus some communication when we release the next .94 we will be fine imho.

=> +1 if we do a release notes and change the zk setting.
                
> port HBASE-8723 to 0.94
> -----------------------
>
>                 Key: HBASE-8776
>                 URL: https://issues.apache.org/jira/browse/HBASE-8776
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.94.8
>            Reporter: Sergey Shelukhin
>            Assignee: Sergey Shelukhin
>             Fix For: 0.94.9
>
>         Attachments: HBASE-8776-v0.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to