[
https://issues.apache.org/jira/browse/HBASE-2342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12846681#action_12846681
]
Todd Lipcon commented on HBASE-2342:
------------------------------------
One question is whether _both_ nodes would be ZK clients, or just the watchdog?
If only the watchdog, we'd have to communicate back and forth between them
about any ZK stuff, which would be a big pain in my opinion.
Another thought worth considering here is whether we could proactively do
"rolling restarts" of region servers to avoid heap fragmentation in the first
place. It's a bit of a pain since you'd end up with a cold cache, but if we
could detect when the heap was getting fragmented and do a very fast RS
restart, it's worth thinking about.
> Consider adding a watchdog node next to region server
> -----------------------------------------------------
>
> Key: HBASE-2342
> URL: https://issues.apache.org/jira/browse/HBASE-2342
> Project: Hadoop HBase
> Issue Type: New Feature
> Components: regionserver
> Reporter: Todd Lipcon
>
> This idea has been bandied about a fair amount. The concept is to add a
> second java process that runs next to each region server to act as a
> watchdog. Several possible purposes:
> - monitor the RS for liveness - if it exhibits Juliet syndrome ("appears
> dead") then we kill it agressively to prevent it from coming back to life
> - restart RS automatically in failure cases
> - potentially move the entire ZK session to the watchdog to decouple node
> liveness from the particular JVM liveness
> Let's discuss in this JIRA.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.