[
https://issues.apache.org/jira/browse/HBASE-5843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13560472#comment-13560472
]
nkeywal commented on HBASE-5843:
--------------------------------
bq. What is the application bug(AB) mentioned in your design doc? Do you mean
hbase bug? or hbase client application code bug?
Mainly HBase, but it could be as well a coprocessor issue. HBase can be
configured to stop the regionserver if a coprocessor sends unexpected
exceptions, but it's quite easy to write buggy stuff, like a coprocessor that
takes resources without freeing them. Here you may need to stop the region
server.
bq. If it is hbase client application code bug, does that need stop/start
region server to fix the issue?
For a pure client (i.e. a user of the hbase.client package), it would be an
HBase bug imho: HBase/a regionserver should be resistant to any client behavior.
For a coprocessor, it's client code executed within the regionserver process.
Thanks to Java, many coprocessors bugs will have a limited effect, but as said
above there are some cases that cannot be handled simply.
bq. If it is hbase code bug, do you refer to hbase bug that cause region server
einter some bad state like deadlock, and so on? I think that could benefit from
restarting region server to fix the problem.
Yes.
> Improve HBase MTTR - Mean Time To Recover
> -----------------------------------------
>
> Key: HBASE-5843
> URL: https://issues.apache.org/jira/browse/HBASE-5843
> Project: HBase
> Issue Type: Umbrella
> Affects Versions: 0.96.0
> Reporter: nkeywal
> Assignee: nkeywal
>
> A part of the approach is described here:
> https://docs.google.com/document/d/1z03xRoZrIJmg7jsWuyKYl6zNournF_7ZHzdi0qz_B4c/edit
> The ideal target is:
> - failure impact client applications only by an added delay to execute a
> query, whatever the failure.
> - this delay is always inferior to 1 second.
> We're not going to achieve that immediately...
> Priority will be given to the most frequent issues.
> Short term:
> - software crash
> - standard administrative tasks as stop/start of a cluster.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira