[ 
https://issues.apache.org/jira/browse/HBASE-11902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qiang Tian updated HBASE-11902:
-------------------------------
    Attachment: hbase11902-master_v2.patch

There is much difference in WAL between master and 0.98 such as thread model, 
error handling. RS is aborted in case of HDFS failure in 0.98, but master 
branch ignores it and the write&sync thread continues..  make the patch simpler.




> RegionServer was blocked while aborting
> ---------------------------------------
>
>                 Key: HBASE-11902
>                 URL: https://issues.apache.org/jira/browse/HBASE-11902
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver, wal
>    Affects Versions: 0.98.4
>         Environment: hbase-0.98.4, hadoop-2.3.0-cdh5.1, jdk1.7
>            Reporter: Victor Xu
>            Assignee: Qiang Tian
>         Attachments: hbase-hadoop-regionserver-hadoop461.cm6.log, 
> hbase11902-master.patch, hbase11902-master_v2.patch, jstack_hadoop461.cm6.log
>
>
> Generally, regionserver automatically aborts when isHealth() returns false. 
> But it sometimes got blocked while aborting. I saved the jstack and logs, and 
> found out that it was caused by datanodes failures. The "regionserver60020" 
> thread was blocked while closing WAL. 
> This issue doesn't happen so frequently, but if it happens, it always leads 
> to huge amount of requests failure. The only way to do is KILL -9.
> I think it's a bug, but I haven't found a decent solution. Does anyone have 
> the same problem?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to