[jira] [Commented] (HBASE-8143) HBase on Hadoop 2 with local short circuit reads (ssr) causes OOM

Enis Soztutar (JIRA) Fri, 25 Oct 2013 14:33:36 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-8143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13805718#comment-13805718
 ]


Enis Soztutar commented on HBASE-8143:
--------------------------------------

For MR especially, deployments usually add the whole hadoop conf dir to the 
classpath, no? I think bigtop also does this. In this case, we would like to 
take the value from hdfs-site. 
bq.  What if it is not the default and still too large or if the default value 
changes (we can't read hdfs-side configs)?
Luckily, dfs.client.read.shortcircuit.buffer.size is not set in 
hdfs-default.xml.  From the FSUtils code, can we call conf.setIfUnset(), would 
that work? 

> HBase on Hadoop 2 with local short circuit reads (ssr) causes OOM 
> ------------------------------------------------------------------
>
>                 Key: HBASE-8143
>                 URL: https://issues.apache.org/jira/browse/HBASE-8143
>             Project: HBase
>          Issue Type: Bug
>          Components: hadoop2
>    Affects Versions: 0.98.0, 0.94.7, 0.95.0
>            Reporter: Enis Soztutar
>            Assignee: Enis Soztutar
>            Priority: Critical
>             Fix For: 0.98.0, 0.96.1
>
>         Attachments: 8143doc.txt, 8143.hbase-default.xml.txt, 
> OpenFileTest.java
>
>
> We've run into an issue with HBase 0.94 on Hadoop2, with SSR turned on that 
> the memory usage of the HBase process grows to 7g, on an -Xmx3g, after some 
> time, this causes OOM for the RSs. 
> Upon further investigation, I've found out that we end up with 200 regions, 
> each having 3-4 store files open. Under hadoop2 SSR, BlockReaderLocal 
> allocates DirectBuffers, which is unlike HDFS 1 where there is no direct 
> buffer allocation. 
> It seems that there is no guards against the memory used by local buffers in 
> hdfs 2, and having a large number of open files causes multiple GB of memory 
> to be consumed from the RS process. 
> This issue is to further investigate what is going on. Whether we can limit 
> the memory usage in HDFS, or HBase, and/or document the setup. 
> Possible mitigation scenarios are: 
>  - Turn off SSR for Hadoop 2
>  - Ensure that there is enough unallocated memory for the RS based on 
> expected # of store files
>  - Ensure that there is lower number of regions per region server (hence 
> number of open files)
> Stack trace:
> {code}
> org.apache.hadoop.hbase.DroppedSnapshotException: region: 
> IntegrationTestLoadAndVerify,yC^P\xD7\x945\xD4,1363388517630.24655343d8d356ef708732f34cfe8946.
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1560)
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1439)
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1380)
>         at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:449)
>         at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:215)
>         at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$500(MemStoreFlusher.java:63)
>         at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:237)
>         at java.lang.Thread.run(Thread.java:662)
> Caused by: java.lang.OutOfMemoryError: Direct buffer memory
>         at java.nio.Bits.reserveMemory(Bits.java:632)
>         at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:97)
>         at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:288)
>         at 
> org.apache.hadoop.hdfs.util.DirectBufferPool.getBuffer(DirectBufferPool.java:70)
>         at 
> org.apache.hadoop.hdfs.BlockReaderLocal.<init>(BlockReaderLocal.java:315)
>         at 
> org.apache.hadoop.hdfs.BlockReaderLocal.newBlockReader(BlockReaderLocal.java:208)
>         at 
> org.apache.hadoop.hdfs.DFSClient.getLocalBlockReader(DFSClient.java:790)
>         at 
> org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:888)
>         at 
> org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:455)
>         at 
> org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:645)
>         at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:689)
>         at java.io.DataInputStream.readFully(DataInputStream.java:178)
>         at 
> org.apache.hadoop.hbase.io.hfile.FixedFileTrailer.readFromStream(FixedFileTrailer.java:312)
>         at 
> org.apache.hadoop.hbase.io.hfile.HFile.pickReaderVersion(HFile.java:543)
>         at 
> org.apache.hadoop.hbase.io.hfile.HFile.createReaderWithEncoding(HFile.java:589)
>         at 
> org.apache.hadoop.hbase.regionserver.StoreFile$Reader.<init>(StoreFile.java:1261)
>         at 
> org.apache.hadoop.hbase.regionserver.StoreFile.open(StoreFile.java:512)
>         at 
> org.apache.hadoop.hbase.regionserver.StoreFile.createReader(StoreFile.java:603)
>         at 
> org.apache.hadoop.hbase.regionserver.Store.validateStoreFile(Store.java:1568)
>         at 
> org.apache.hadoop.hbase.regionserver.Store.commitFile(Store.java:845)
>         at 
> org.apache.hadoop.hbase.regionserver.Store.access$500(Store.java:109)
>         at 
> org.apache.hadoop.hbase.regionserver.Store$StoreFlusherImpl.commit(Store.java:2209)
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1541)
> {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-8143) HBase on Hadoop 2 with local short circuit reads (ssr) causes OOM

Reply via email to