[ 
https://issues.apache.org/jira/browse/HBASE-2643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HBASE-2643:
-------------------------------

    Description: 
When splitting the WAL and encountering EOF, it's not clear what to do. Initial 
discussion of this started in http://review.hbase.org/r/74/ - summarizing here 
for brevity:

We can get an EOFException while splitting the WAL in the following cases:
- The writer died after creating the file but before even writing the header 
(or crashed halfway through writing the header)
- The writer died in the middle of flushing some data - sync() guarantees that 
we can see _at least_ the last edit, but we may see half of an edit that was 
being written out when the RS crashed (especially for large rows)
- The data was actually corrupted somehow (eg a length field got changed to be 
too long and thus points past EOF)

Ideally we would know when we see EOF whether it was really the last record, 
and in that case, simply drop that record (it wasn't synced, so therefore we 
dont need to split it). Some open questions:
  - Currently we ignore empty files. Is it ok to ignore an empty log file if 
it's not the last one?
  - Similarly, do we ignore an EOF mid-record if it's not the last log file?

  was:
During review of 2437, lots of discussion around how to deal with eof.  This 
issue is about making decision about how to do eof treatment when reading wals.

Below is copied from http://review.hbase.org/r/74/

{code}
yes, I think so. The RS could have crashed right after opening but before 
writing any data, and if the master failed to recover that, then we'd never 
recover that region. I say ignore with a WARN
Cosmin Lehene 6 days, 18 hours ago (May 25th, 2010, 2:27 p.m.)
more aspects here:
I think the reported size will be >0 after recover, even if file has no 
records. I was asking if we should add logic to check if it's the last log. 
EOF for non zero length, non zero records file means file is corrupted. 
Todd Lipcon 5 days, 19 hours ago (May 26th, 2010, 1:27 p.m.)
I agree if it has no records (I think - do we syncfs after writing the 
sequencefile header?). But there's the case where inside SequenceFile we call 
create, but never actually write any bytes. This is still worth recovering.

In general I think a corrupt tail means we should drop that record (incomplete 
written record) but not shut down. This is only true if it's the tail record, 
though.
Cosmin Lehene 3 days ago (May 29th, 2010, 8:56 a.m.)
 - How can we determine it's the tail record or the 5th out of 10 records 
that's broken? We just get an EOF when calling next()
 - Currently we ignore empty files. Is it ok to ignore an empty log file if 
it's not the last one?
 - I'm not sure whether it's possible to get an EOF when acquiring the reader 
for a file after it has been recoverFileLease()-ed. So the whole try/catch for 
HLog.getReader might be redundant.

When reading log entries we currently don't catch. We read as much as we can 
and then let any exception bubble up. splitLog logic decides what to do next: 
If we get to a broken record it will most probably throw an EOF in there and 
based on skip.errors setting it will act accordingly. There will be no EOF if 
there are no records, though and we continue.

There are two possible reasons for a file being corrupted/empty:
1 HRegion died => only the last log entry (edit) in the last log in the 
directory should be affected => we could continue but are we sure it's the tail 
record?
2 Another component screwed things up (bug) => other logs than the last one 
could be affected => we should halt in this situation.
Add comment
src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java (Diff revision 
1)
1455    
        throw e;
see above logic - writer could have crashed after writing only part of the 
sequencefile header, etc, so we should just warn and continue
Cosmin Lehene 6 days, 18 hours ago (May 25th, 2010, 2:27 p.m.)
see above comment
Add comment
src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java (Diff revision 
1)
1471    
    } finally {
I think we need to handle EOF specially here too, though OK to leave this for 
another JIRA. IIRC one of the FB guys opened this already
Cosmin Lehene 6 days, 18 hours ago (May 25th, 2010, 2:30 p.m.)
what's the other JIRA? see my above comments.
Todd Lipcon 5 days, 19 hours ago (May 26th, 2010, 1:27 p.m.)
Can't find it now... does my above comment make sense?
{code}


Updated description to a summary of the issue at hand - refer back to original 
review for original discussion.

> Figure how to deal with eof splitting logs
> ------------------------------------------
>
>                 Key: HBASE-2643
>                 URL: https://issues.apache.org/jira/browse/HBASE-2643
>             Project: HBase
>          Issue Type: Bug
>            Reporter: stack
>            Priority: Blocker
>             Fix For: 0.21.0
>
>
> When splitting the WAL and encountering EOF, it's not clear what to do. 
> Initial discussion of this started in http://review.hbase.org/r/74/ - 
> summarizing here for brevity:
> We can get an EOFException while splitting the WAL in the following cases:
> - The writer died after creating the file but before even writing the header 
> (or crashed halfway through writing the header)
> - The writer died in the middle of flushing some data - sync() guarantees 
> that we can see _at least_ the last edit, but we may see half of an edit that 
> was being written out when the RS crashed (especially for large rows)
> - The data was actually corrupted somehow (eg a length field got changed to 
> be too long and thus points past EOF)
> Ideally we would know when we see EOF whether it was really the last record, 
> and in that case, simply drop that record (it wasn't synced, so therefore we 
> dont need to split it). Some open questions:
>   - Currently we ignore empty files. Is it ok to ignore an empty log file if 
> it's not the last one?
>   - Similarly, do we ignore an EOF mid-record if it's not the last log file?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to