[ 
https://issues.apache.org/jira/browse/HBASE-15984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey updated HBASE-15984:
--------------------------------
    Attachment: HBASE-15984.1.patch

-01

  - add TRACE level messages with file offsets to aid debugging
  - check for cleanly closed files and offset when handing eof in replication
  - if we can detect that things are not right with handling a WAL file, retry


A downside to this approach is that a corrupt WAL file that was cleanly closed 
would cause us to loop until an operator can intervene. But I can't think of a 
way to avoid that without introducing automated dataloss. To mitigate this, 
there's a WARN when restarting happens that points to the parent issue. 

> Given failure to parse a given WAL that was closed cleanly, replay the WAL.
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-15984
>                 URL: https://issues.apache.org/jira/browse/HBASE-15984
>             Project: HBase
>          Issue Type: Sub-task
>          Components: Replication
>            Reporter: Sean Busbey
>            Assignee: Sean Busbey
>            Priority: Critical
>             Fix For: 2.0.0, 1.3.0, 1.0.4, 1.4.0, 1.2.2, 0.98.20, 1.1.6
>
>         Attachments: HBASE-15984.1.patch
>
>
> subtask for a general work around for "underlying reader failed / is in a bad 
> state" just for the case where a WAL 1) was closed cleanly and 2) we can tell 
> that our current offset ought not be the end of parseable entries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to