Harsh J created HBASE-10424:
-------------------------------

             Summary: HMaster could capture stacks of RSes it deems 
unresponsive during assignments
                 Key: HBASE-10424
                 URL: https://issues.apache.org/jira/browse/HBASE-10424
             Project: HBase
          Issue Type: Wish
          Components: Region Assignment
    Affects Versions: 0.96.0
            Reporter: Harsh J
            Priority: Trivial


Often there are cases of a region not getting assigned due to timeouts (while 
others do go through). In this case, the Master does appear to enter a 
never-ending retry operation where it retries each chosen server several times 
before moving to another.

For debugging in such a scenario, where the master is best aware of the 
situation, it could use that to its advantage and help capture issues better if 
it probably setup an N retry threshold (for # of servers tried) and run a HTTP 
GET on the current timing out RS's info port, to capture its /stacks end point 
and dump the output in its logs for investigation later.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to