[jira] [Commented] (HBASE-4107) OOME while writing WAL checksum causes corrupt WAL

Andy Sautins (JIRA) Fri, 26 Aug 2011 11:09:53 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-4107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13091917#comment-13091917
 ]


Andy Sautins commented on HBASE-4107:
-------------------------------------

The behavior that Dave is seeing is what we were seeing as well.  It looks like 
objects are created from within the call to HLog.sync, specifically in our case 
DFSClient was creating a new Packet object and tried to allocate a byte array 
it couldn't allocate.  

For the time being gotten around this issue by increasing the heap from 2G to 
4G on our regionservers.  That seem to resolve it for us for now.  I'll look 
into it again, but it seems like an unfortunate situation where the data is 
written, but the checksum isn't able to be written due to an OOM.  It seems 
like possibly changing DFSClient to use a pool of pre-allocated Packet objects 
for writing might address this, but I'm not sure I fully grasp the full problem 
yet. 

 

> OOME while writing WAL checksum causes corrupt WAL
> --------------------------------------------------
>
>                 Key: HBASE-4107
>                 URL: https://issues.apache.org/jira/browse/HBASE-4107
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver, wal
>    Affects Versions: 0.90.1
>         Environment: CentOS 5.5x64
>            Reporter: Andy Sautins
>         Attachments: master.splitting.log, regionserver.oom.log
>
>
> An issue was observed where upon shutdown of a regionserver the regionserver 
> log was corrupt.  It appears from the following stacktrace that an Java heap 
> memory exception occurred while writing the checksum to the WAL.  Corrupting 
> the WAL can potentially cause data loss. 
> 2011-07-14 14:54:53,741 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: 
> Could not append. Requesting close of hlog
> java.io.IOException: Reflection
>         at 
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(SequenceFileLogWriter.java:147)
>         at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:987)
>         at 
> org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.run(HLog.java:964)
> Caused by: java.lang.reflect.InvocationTargetException
>         at sun.reflect.GeneratedMethodAccessor1336.invoke(Unknown Source)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at 
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(SequenceFileLogWriter.java:145)
>         ... 2 more
> Caused by: java.lang.OutOfMemoryError: Java heap space
>         at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$Packet.<init>(DFSClient.java:2375)
>         at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.writeChunk(DFSClient.java:3271)
>         at 
> org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunk(FSOutputSummer.java:150)
>         at 
> org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:132)
>         at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.sync(DFSClient.java:3354)
>         at 
> org.apache.hadoop.fs.FSDataOutputStream.sync(FSDataOutputStream.java:97)
>         at 
> org.apache.hadoop.io.SequenceFile$Writer.syncFs(SequenceFile.java:944)
>         ... 6 more

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4107) OOME while writing WAL checksum causes corrupt WAL

Reply via email to