Simple check on the master overview page if the number of currently running
regionservers is unchanged.
-------------------------------------------------------------------------------------------------------
Key: HBASE-2117
URL: https://issues.apache.org/jira/browse/HBASE-2117
Project: Hadoop HBase
Issue Type: New Feature
Components: master, regionserver
Affects Versions: 0.20.2
Reporter: Ferdy
Incidentally, it happens that some of our regionservers just stop working. The
regionserver logs show some sort of termination and the affected regionserver
is just removed from the master page. Besides the actual problem of the
termination, what I was missing was some sort of warning (from either running
client code or the master page) that some regionservers are having trouble.
It seems like the Master is ok with the fact that a regionserver suddenly
decides to stop. The result is that the clients depending on the data in Hbase
will be presented an incomplete data set, at least as long as the failing
regions are not re-assigned yet. In order to have this monitored, I decided to
create a patch that exposes an extra piece of information on the master page.
An 'OK:' is presented if the current number of regionservers is unchanged since
the start of the processes. An 'ERROR:' is shown whenever the current number is
not the same. What the master page does is reading the 'regionservers' file
once, and remember the number of slaves so that is can be used in the check.
(So afterwards changes to this file are not supported).
Perhaps this is not the right way of doing things. Please let me know if there
are any existing solutions for these issues.
I will attach a patch right away.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.