[jira] [Updated] (HBASE-14317) Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL

stack (JIRA) Wed, 02 Sep 2015 14:23:28 -0700

     [ 
https://issues.apache.org/jira/browse/HBASE-14317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


stack updated HBASE-14317:
--------------------------
    Attachment: 14317v11.txt

Added back timeout on getting of sequenceid. We were failing some tests because 
of improper log management (WAL append or sync failed but not log roll to move 
the test forward... ). I fixed a few tests but let the sequenceid timeout; it 
will be easier to find the offenders.

Added note on new tighter semantic to head of FSHLog.

Lets see how this one does (got stuck on TestFSErrorExposure... couldn't figure 
why it hung).

> Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL
> -----------------------------------------------------
>
>                 Key: HBASE-14317
>                 URL: https://issues.apache.org/jira/browse/HBASE-14317
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 1.2.0, 1.1.1
>            Reporter: stack
>            Priority: Blocker
>             Fix For: 2.0.0, 1.2.0, 1.0.3, 1.1.3
>
>         Attachments: 14317.test.txt, 14317v10.txt, 14317v11.txt, 
> 14317v5.branch-1.2.txt, 14317v5.txt, 14317v9.txt, HBASE-14317-v1.patch, 
> HBASE-14317-v2.patch, HBASE-14317-v3.patch, HBASE-14317-v4.patch, 
> HBASE-14317.patch, [Java] RS stuck on WAL sync to a dead DN - 
> Pastebin.com.html, append-only-test.patch, raw.php, repro.txt, san_dump.txt, 
> subset.of.rs.log
>
>
> hbase-1.1.1 and hadoop-2.7.1
> We try to roll logs because can't append (See HDFS-8960) but we get stuck. 
> See attached thread dump and associated log. What is interesting is that 
> syncers are waiting to take syncs to run and at same time we want to flush so 
> we are waiting on a safe point but there seems to be nothing in our ring 
> buffer; did we go to roll log and not add safe point sync to clear out 
> ringbuffer?
> Needs a bit of study. Try to reproduce.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-14317) Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL

Reply via email to