[
https://issues.apache.org/jira/browse/HBASE-2506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13002297#comment-13002297
]
ryan rawson commented on HBASE-2506:
------------------------------------
we could catch the oom in this case and instead return an error to the
client. if you are unable to allocate a 500MB buffer to send a rpc
response it might not actually need to kill the RS, because if we are
truly out of memory different threads will catch that. So catch that
OOM then send an exception response instead.
Does that sound good?
> Too easy to OOME a RS
> ---------------------
>
> Key: HBASE-2506
> URL: https://issues.apache.org/jira/browse/HBASE-2506
> Project: HBase
> Issue Type: Bug
> Reporter: Jean-Daniel Cryans
> Priority: Blocker
> Labels: moved_from_0_20_5
> Fix For: 0.92.0
>
>
> Testing a cluster with 1GB heap, I found that we are letting the region
> servers kill themselves too easily when scanning using pre-fetching. To
> reproduce, get 10-20M rows using PE and run a count in the shell using CACHE
> => 30000 or any other very high number. For good measure, here's the stack
> trace:
> {code}
> 2010-04-30 13:20:23,241 FATAL
> org.apache.hadoop.hbase.regionserver.HRegionServer: OutOfMemoryError,
> aborting.
> java.lang.OutOfMemoryError: Java heap space
> at java.util.Arrays.copyOf(Arrays.java:2786)
> at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94)
> at java.io.DataOutputStream.write(DataOutputStream.java:90)
> at org.apache.hadoop.hbase.client.Result.writeArray(Result.java:478)
> at
> org.apache.hadoop.hbase.io.HbaseObjectWritable.writeObject(HbaseObjectWritable.java:312)
> at
> org.apache.hadoop.hbase.io.HbaseObjectWritable.write(HbaseObjectWritable.java:229)
> at
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:941)
> 2010-04-30 13:20:23,241 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics:
> request=0.0, regions=29, stores=29, storefiles=44, storefileIndexSize=6,
> memstoreSize=255,
> compactionQueueSize=0, usedHeap=926, maxHeap=987, blockCacheSize=1700064,
> blockCacheFree=205393696, blockCacheCount=0, blockCacheHitRatio=0
> {code}
> I guess the same could happen with largish write buffers. We need something
> better than OOME.
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira