[jira] [Commented] (HBASE-4107) OOME while writing WAL checksum causes corrupt WAL

stack (JIRA) Fri, 26 Aug 2011 10:09:54 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-4107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13091878#comment-13091878
 ]


stack commented on HBASE-4107:
------------------------------

@Dave you need a walplayer (HBASE-3619).  Someone of us needs to hack it up.  I 
don't think it'd be too hard to do.

@Andy Do you think it the checksum'ing that caused the OOME?  That might just 
be where we tripped over the OOME first.  I'd more suspect the 
HFile$Reader.decompress that is going on concurrently.  Looks like the master 
splitting the log passed the data missing the checksum, so yeah, we lost that 
bit of data on the tail of WAL around time of OOME (My analysis here is cursory 
just looking at logs.  I did not do confirmation by digging in code).

> OOME while writing WAL checksum causes corrupt WAL
> --------------------------------------------------
>
>                 Key: HBASE-4107
>                 URL: https://issues.apache.org/jira/browse/HBASE-4107
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver, wal
>    Affects Versions: 0.90.1
>         Environment: CentOS 5.5x64
>            Reporter: Andy Sautins
>         Attachments: master.splitting.log, regionserver.oom.log
>
>
> An issue was observed where upon shutdown of a regionserver the regionserver 
> log was corrupt.  It appears from the following stacktrace that an Java heap 
> memory exception occurred while writing the checksum to the WAL.  Corrupting 
> the WAL can potentially cause data loss. 
> 2011-07-14 14:54:53,741 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: 
> Could not append. Requesting close of hlog
> java.io.IOException: Reflection
>         at 
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(SequenceFileLogWriter.java:147)
>         at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:987)
>         at 
> org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.run(HLog.java:964)
> Caused by: java.lang.reflect.InvocationTargetException
>         at sun.reflect.GeneratedMethodAccessor1336.invoke(Unknown Source)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at 
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(SequenceFileLogWriter.java:145)
>         ... 2 more
> Caused by: java.lang.OutOfMemoryError: Java heap space
>         at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$Packet.<init>(DFSClient.java:2375)
>         at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.writeChunk(DFSClient.java:3271)
>         at 
> org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunk(FSOutputSummer.java:150)
>         at 
> org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:132)
>         at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.sync(DFSClient.java:3354)
>         at 
> org.apache.hadoop.fs.FSDataOutputStream.sync(FSDataOutputStream.java:97)
>         at 
> org.apache.hadoop.io.SequenceFile$Writer.syncFs(SequenceFile.java:944)
>         ... 6 more

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4107) OOME while writing WAL checksum causes corrupt WAL

Reply via email to