[jira] [Commented] (HBASE-15252) Data loss when replaying wal if HDFS timeout

Duo Zhang (JIRA) Wed, 10 Feb 2016 20:19:17 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-15252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15142227#comment-15142227
 ]


Duo Zhang commented on HBASE-15252:
-----------------------------------

Changing the exception type back to IPBE can solve the problem(which cause the 
openHRegion fail with an IOException) but I want to revisit the readNext method 
because I'm a little confusing of how we deal with {{EOFException}}.

{code:title=ProtobufLogReader.java}
      } catch (EOFException eof) {
        LOG.trace("Encountered a malformed edit, seeking back to last good 
position in file", eof);
        // If originalPosition is < 0, it is rubbish and we cannot use it 
(probably local fs)
        if (originalPosition < 0) throw eof;
        // Else restore our position to original location in hope that next 
time through we will
        // read successfully.
        seekOnFs(originalPosition);
        return false;
      }
{code}

Here we seek to the last good position, but we call “return false” instead of 
"continue". This cause the {{next}} method of {{ReaderBase}} returns null and 
make the upper layer think it has reached the end of file and close the current 
log reader. So what is purpose of the seek here? And in fact, if the 
{{EOFException}} really means end of file, I do not think we could read a valid 
wal entry successfully when retrying...

Thanks. 

> Data loss when replaying wal if HDFS timeout
> --------------------------------------------
>
>                 Key: HBASE-15252
>                 URL: https://issues.apache.org/jira/browse/HBASE-15252
>             Project: HBase
>          Issue Type: Bug
>          Components: wal
>            Reporter: Duo Zhang
>            Assignee: Duo Zhang
>         Attachments: HBASE-15252-testcase.patch
>
>
> This is a problem introduced by HBASE-13825 where we change the exception 
> type in catch block in {{readNext}} method of {{ProtobufLogReader}}.
> {code:title=ProtobufLogReader.java}
>       try {
>           ......
>           ProtobufUtil.mergeFrom(builder, new 
> LimitInputStream(this.inputStream, size),
>             (int)size);
>         } catch (IOException ipbe) { // <------ used to be 
> InvalidProtocolBufferException
>           throw (EOFException) new EOFException("Invalid PB, EOF? Ignoring; 
> originalPosition=" +
>             originalPosition + ", currentPosition=" + 
> this.inputStream.getPos() +
>             ", messageSize=" + size + ", currentAvailable=" + 
> available).initCause(ipbe);
>         }
> {code}
> Here if the {{inputStream}} throws an {{IOException}} due to timeout or 
> something, we just convert it to an {{EOFException}} and at the bottom of 
> this method, we ignore {{EOFException}} and return false. This cause the 
> upper layer think we reach the end of file. So when replaying we will treat 
> the HDFS timeout error as a normal end of file and cause data loss.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-15252) Data loss when replaying wal if HDFS timeout

Reply via email to