[
https://issues.apache.org/jira/browse/HBASE-5843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427182#comment-13427182
]
nkeywal commented on HBASE-5843:
--------------------------------
Same tests as before, with the datanodes.
1) Clean stop of one RS; wait for all regions to become online again:
Pseudo distributed without datanode:
0.92: ~800 seconds
0.96: ~13 seconds
With two datanodes, on hadoop 1.0
0.92: ~460 seconds
0.96: ~12 seconds
3) Start of the cluster after a clean stop; wait for all regions to
Pseudo distributed without datanode:
become online.
0.92: ~1020s
0.94: ~1023s (tested once only)
0.96: ~31s
With two datanodes, on hadoop 1.0
0.92: ~640 seconds
0.96: ~35 seconds
So it seems 0.92 is faster with the DN, but we still see a major improvement.
> Improve HBase MTTR - Mean Time To Recover
> -----------------------------------------
>
> Key: HBASE-5843
> URL: https://issues.apache.org/jira/browse/HBASE-5843
> Project: HBase
> Issue Type: Umbrella
> Affects Versions: 0.96.0
> Reporter: nkeywal
> Assignee: nkeywal
>
> A part of the approach is described here:
> https://docs.google.com/document/d/1z03xRoZrIJmg7jsWuyKYl6zNournF_7ZHzdi0qz_B4c/edit
> The ideal target is:
> - failure impact client applications only by an added delay to execute a
> query, whatever the failure.
> - this delay is always inferior to 1 second.
> We're not going to achieve that immediately...
> Priority will be given to the most frequent issues.
> Short term:
> - software crash
> - standard administrative tasks as stop/start of a cluster.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira