[
https://issues.apache.org/jira/browse/HBASE-11902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14227322#comment-14227322
]
Qiang Tian commented on HBASE-11902:
------------------------------------
ok. the latest failure is because, in the testcase, only WAL write fails, if we
hide the exception( just decrement the counter) and continues, the data flush
will succeed, so completeCacheFlush call decrement it again!.
to preserve the counter semantics, simple is the best --- return right
away.(the original patch)
> RegionServer was blocked while aborting
> ---------------------------------------
>
> Key: HBASE-11902
> URL: https://issues.apache.org/jira/browse/HBASE-11902
> Project: HBase
> Issue Type: Bug
> Components: regionserver, wal
> Affects Versions: 0.98.4
> Environment: hbase-0.98.4, hadoop-2.3.0-cdh5.1, jdk1.7
> Reporter: Victor Xu
> Assignee: Qiang Tian
> Attachments: hbase-hadoop-regionserver-hadoop461.cm6.log,
> hbase11902-master.patch, hbase11902-master_v2.patch, jstack_hadoop461.cm6.log
>
>
> Generally, regionserver automatically aborts when isHealth() returns false.
> But it sometimes got blocked while aborting. I saved the jstack and logs, and
> found out that it was caused by datanodes failures. The "regionserver60020"
> thread was blocked while closing WAL.
> This issue doesn't happen so frequently, but if it happens, it always leads
> to huge amount of requests failure. The only way to do is KILL -9.
> I think it's a bug, but I haven't found a decent solution. Does anyone have
> the same problem?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)