[
https://issues.apache.org/jira/browse/HBASE-14317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14721199#comment-14721199
]
Elliott Clark commented on HBASE-14317:
---------------------------------------
{code}
15/08/29 11:02:00 FATAL wal.FSHLog: Waited too long in attainSafePoint. Waiting
to get to seqId=36243253 However we are only at seqId=36243210 after waiting
60000
{code}
Just had this happen. Here's the log if it helps. There were way more than just
one seqid that was stuck.
> Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL
> -----------------------------------------------------
>
> Key: HBASE-14317
> URL: https://issues.apache.org/jira/browse/HBASE-14317
> Project: HBase
> Issue Type: Bug
> Affects Versions: 1.2.0, 1.1.1
> Reporter: stack
> Priority: Critical
> Attachments: 14317.test.txt, HBASE-14317.patch, [Java] RS stuck on
> WAL sync to a dead DN - Pastebin.com.html, raw.php, subset.of.rs.log
>
>
> hbase-1.1.1 and hadoop-2.7.1
> We try to roll logs because can't append (See HDFS-8960) but we get stuck.
> See attached thread dump and associated log. What is interesting is that
> syncers are waiting to take syncs to run and at same time we want to flush so
> we are waiting on a safe point but there seems to be nothing in our ring
> buffer; did we go to roll log and not add safe point sync to clear out
> ringbuffer?
> Needs a bit of study. Try to reproduce.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)