Tianying Chang created HBASE-15155:
--------------------------------------
Summary: Show All RPC handler tasks stop working after cluster is
under heavy load for a while
Key: HBASE-15155
URL: https://issues.apache.org/jira/browse/HBASE-15155
Project: HBase
Issue Type: Bug
Components: monitoring
Affects Versions: 0.94.19, 1.0.0, 0.98.0
Reporter: Tianying Chang
Assignee: Tianying Chang
After we upgrade from 94.7 to 94.26 and 1.0, we found that "Show All RPC
handler status" link on RS webUI stops working after running in production
cluster with relatively high load for several days.
Turn out to be it is a bug introduced by
https://issues.apache.org/jira/browse/HBASE-10312 The BoundedFIFOBuffer cause
RPCHandler Status overriden/removed permanently when there is a spike of
non-RPC tasks status that is over the MAX_SIZE (1000). So as long as the RS
experienced "high" load once, the RPC status monitoring is gone forever, until
RS is restarted.
We added a unit test that can repro this. And the fix can pass the test.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)