Simple check on the master overview page if the number of currently running 
regionservers is unchanged.
-------------------------------------------------------------------------------------------------------

                 Key: HBASE-2117
                 URL: https://issues.apache.org/jira/browse/HBASE-2117
             Project: Hadoop HBase
          Issue Type: New Feature
          Components: master, regionserver
    Affects Versions: 0.20.2
            Reporter: Ferdy


Incidentally, it happens that some of our regionservers just stop working. The 
regionserver logs show some sort of termination and the affected regionserver 
is just removed from the master page. Besides the actual problem of the 
termination, what I was missing was some sort of warning (from either running 
client code or the master page) that some regionservers are having trouble.

It seems like the Master is ok with the fact that a regionserver suddenly 
decides to stop. The result is that the clients depending on the data in Hbase 
will be presented an incomplete data set, at least as long as the failing 
regions are not re-assigned yet. In order to have this monitored, I decided to 
create a patch that exposes an extra piece of information on the master page. 
An 'OK:' is presented if the current number of regionservers is unchanged since 
the start of the processes. An 'ERROR:' is shown whenever the current number is 
not the same. What the master page does is reading the 'regionservers' file 
once, and remember the number of slaves so that is can be used in the check. 
(So afterwards changes to this file are not supported).

Perhaps this is not the right way of doing things. Please let me know if there 
are any existing solutions for these issues.

I will attach a patch right away.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to