[
https://issues.apache.org/jira/browse/HBASE-13592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14520233#comment-14520233
]
Andrew Purtell commented on HBASE-13592:
----------------------------------------
Nice work [~vik.karma]
> RegionServer sometimes gets stuck during shutdown in case of cache flush
> failures
> ---------------------------------------------------------------------------------
>
> Key: HBASE-13592
> URL: https://issues.apache.org/jira/browse/HBASE-13592
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.98.10
> Reporter: Vikas Vishwakarma
> Assignee: Vikas Vishwakarma
> Fix For: 0.98.13
>
> Attachments: HBASE-13592-0.98.patch
>
>
> Observed that RegionServer sometimes gets stuck during shutdown in case of
> cache flush failures. On adding few debug logs and looking through the stack
> trace RegionServer process looks stuck in closeWAL -> hlog.close ->
> closeBarrier.stopAndDrainOps(); during the shutdown sequence in the run method
> From the RegionServer logs we see there are multiple attempts to flush cache
> for a particular region which increments the beginOp count in DrainBarrier
> but all the flush attempts fails somewhere in wal sync and the DrainBarrier
> endOp count decrement never happens. Later on when shutdown is initiated
> RegionServer process is permanently stuck here
> In this case hbase stop also does not work and RegionServer process has to be
> explicitly killed using kill -9
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)