Harsh J created HBASE-10424:
-------------------------------
Summary: HMaster could capture stacks of RSes it deems
unresponsive during assignments
Key: HBASE-10424
URL: https://issues.apache.org/jira/browse/HBASE-10424
Project: HBase
Issue Type: Wish
Components: Region Assignment
Affects Versions: 0.96.0
Reporter: Harsh J
Priority: Trivial
Often there are cases of a region not getting assigned due to timeouts (while
others do go through). In this case, the Master does appear to enter a
never-ending retry operation where it retries each chosen server several times
before moving to another.
For debugging in such a scenario, where the master is best aware of the
situation, it could use that to its advantage and help capture issues better if
it probably setup an N retry threshold (for # of servers tried) and run a HTTP
GET on the current timing out RS's info port, to capture its /stacks end point
and dump the output in its logs for investigation later.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)