[
https://issues.apache.org/jira/browse/HBASE-4107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13091917#comment-13091917
]
Andy Sautins commented on HBASE-4107:
-------------------------------------
The behavior that Dave is seeing is what we were seeing as well. It looks like
objects are created from within the call to HLog.sync, specifically in our case
DFSClient was creating a new Packet object and tried to allocate a byte array
it couldn't allocate.
For the time being gotten around this issue by increasing the heap from 2G to
4G on our regionservers. That seem to resolve it for us for now. I'll look
into it again, but it seems like an unfortunate situation where the data is
written, but the checksum isn't able to be written due to an OOM. It seems
like possibly changing DFSClient to use a pool of pre-allocated Packet objects
for writing might address this, but I'm not sure I fully grasp the full problem
yet.
> OOME while writing WAL checksum causes corrupt WAL
> --------------------------------------------------
>
> Key: HBASE-4107
> URL: https://issues.apache.org/jira/browse/HBASE-4107
> Project: HBase
> Issue Type: Bug
> Components: regionserver, wal
> Affects Versions: 0.90.1
> Environment: CentOS 5.5x64
> Reporter: Andy Sautins
> Attachments: master.splitting.log, regionserver.oom.log
>
>
> An issue was observed where upon shutdown of a regionserver the regionserver
> log was corrupt. It appears from the following stacktrace that an Java heap
> memory exception occurred while writing the checksum to the WAL. Corrupting
> the WAL can potentially cause data loss.
> 2011-07-14 14:54:53,741 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog:
> Could not append. Requesting close of hlog
> java.io.IOException: Reflection
> at
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(SequenceFileLogWriter.java:147)
> at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:987)
> at
> org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.run(HLog.java:964)
> Caused by: java.lang.reflect.InvocationTargetException
> at sun.reflect.GeneratedMethodAccessor1336.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(SequenceFileLogWriter.java:145)
> ... 2 more
> Caused by: java.lang.OutOfMemoryError: Java heap space
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$Packet.<init>(DFSClient.java:2375)
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.writeChunk(DFSClient.java:3271)
> at
> org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunk(FSOutputSummer.java:150)
> at
> org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:132)
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.sync(DFSClient.java:3354)
> at
> org.apache.hadoop.fs.FSDataOutputStream.sync(FSDataOutputStream.java:97)
> at
> org.apache.hadoop.io.SequenceFile$Writer.syncFs(SequenceFile.java:944)
> ... 6 more
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira