Figure how to deal with eof splitting logs
------------------------------------------
Key: HBASE-2643
URL: https://issues.apache.org/jira/browse/HBASE-2643
Project: HBase
Issue Type: Bug
Reporter: stack
Priority: Blocker
Fix For: 0.21.0
During review of 2437, lots of discussion around how to deal with eof. This
issue is about making decision about how to do eof treatment when reading wals.
Below is copied from http://review.hbase.org/r/74/
{code}
yes, I think so. The RS could have crashed right after opening but before
writing any data, and if the master failed to recover that, then we'd never
recover that region. I say ignore with a WARN
Cosmin Lehene 6 days, 18 hours ago (May 25th, 2010, 2:27 p.m.)
more aspects here:
I think the reported size will be >0 after recover, even if file has no
records. I was asking if we should add logic to check if it's the last log.
EOF for non zero length, non zero records file means file is corrupted.
Todd Lipcon 5 days, 19 hours ago (May 26th, 2010, 1:27 p.m.)
I agree if it has no records (I think - do we syncfs after writing the
sequencefile header?). But there's the case where inside SequenceFile we call
create, but never actually write any bytes. This is still worth recovering.
In general I think a corrupt tail means we should drop that record (incomplete
written record) but not shut down. This is only true if it's the tail record,
though.
Cosmin Lehene 3 days ago (May 29th, 2010, 8:56 a.m.)
- How can we determine it's the tail record or the 5th out of 10 records
that's broken? We just get an EOF when calling next()
- Currently we ignore empty files. Is it ok to ignore an empty log file if
it's not the last one?
- I'm not sure whether it's possible to get an EOF when acquiring the reader
for a file after it has been recoverFileLease()-ed. So the whole try/catch for
HLog.getReader might be redundant.
When reading log entries we currently don't catch. We read as much as we can
and then let any exception bubble up. splitLog logic decides what to do next:
If we get to a broken record it will most probably throw an EOF in there and
based on skip.errors setting it will act accordingly. There will be no EOF if
there are no records, though and we continue.
There are two possible reasons for a file being corrupted/empty:
1 HRegion died => only the last log entry (edit) in the last log in the
directory should be affected => we could continue but are we sure it's the tail
record?
2 Another component screwed things up (bug) => other logs than the last one
could be affected => we should halt in this situation.
Add comment
src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java (Diff revision
1)
1455
throw e;
see above logic - writer could have crashed after writing only part of the
sequencefile header, etc, so we should just warn and continue
Cosmin Lehene 6 days, 18 hours ago (May 25th, 2010, 2:27 p.m.)
see above comment
Add comment
src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java (Diff revision
1)
1471
} finally {
I think we need to handle EOF specially here too, though OK to leave this for
another JIRA. IIRC one of the FB guys opened this already
Cosmin Lehene 6 days, 18 hours ago (May 25th, 2010, 2:30 p.m.)
what's the other JIRA? see my above comments.
Todd Lipcon 5 days, 19 hours ago (May 26th, 2010, 1:27 p.m.)
Can't find it now... does my above comment make sense?
{code}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.