I have discovered a few things since I wrote this: - DirectBuffers allow native methods a no-copy way of sharing data with JVM/Java code. -- The codecs use this for that purpose. - HFile wasn't returning codecs to the codec pool. - Java 1.7 has GC bugs with DirectBuffers - switching to 1.6.0_13 fixed my OOME crashes.
Does anyone know someone working on Java 1.7? My repro case isn't great: "use hbase trunk and put at least 120 gb of data in compressed tables". -ryan On Fri, Mar 27, 2009 at 1:37 PM, Ryan Rawson <[email protected]> wrote: > Hi all, > > I ran into this on my TRUNK hbase setup: > java.io.IOException: java.lang.OutOfMemoryError: Direct buffer memory > > The pertinent details of the stack trace are: > at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306) > at > org.apache.hadoop.io.compress.zlib.ZlibDecompressor.<init>(ZlibDecompressor.java:110) > at > org.apache.hadoop.io.compress.GzipCodec.createDecompressor(GzipCodec.java:188) > at > org.apache.hadoop.io.compress.CodecPool.getDecompressor(CodecPool.java:120) > at > org.apache.hadoop.hbase.io.hfile.Compression$Algorithm.getDecompressor(Compression.java:267) > at > org.apache.hadoop.hbase.io.hfile.HFile$Reader.decompress(HFile.java:871) > > Ok, so what is this mysterious direct buffer and why am I dying? > > This might be because I have 800 regions and 300+ gb of compressed hfiles. > > So I looked at the ZlibDecompressor in hadoop, and it looks like there is > _no_ reason whatsoever to be using direct buffers. > > A little background: > ByteBuffer offers 2 types of allocation: normal (backed by byte[]) and > 'direct'. The direct kind lives outside the scope of normal heap, and can > be passed via nio to the underlying OS possibly optimizing things. But > there is only so much direct buffer space available, and you should only use > it if you are _sure_ you need to. Furthermore there appears to be GC bugs > that doesn't let the JVM reclaim these buffers as quickly as it should - you > can go OOME without actually being OOME. > > The hadoop compression library attempts to keep things under control by > reusing codecs and therefore the direct buffers. But each codec uses > 128kbytes of buffer and once you open too many, you go OOME. > > I am not sure why the lib uses direct buffers. We might be able to switch > it to not using direct buffers... > > I think we should attempt to procure our own fast zlib-like compression > library that is not in hadoop however. >
