OOME does not cause graceful shutdown under some failure scenarios
------------------------------------------------------------------

                 Key: HBASE-1040
                 URL: https://issues.apache.org/jira/browse/HBASE-1040
             Project: Hadoop HBase
          Issue Type: Bug
          Components: regionserver
    Affects Versions: 0.18.1
            Reporter: Andrew Purtell


Probably OOME related updates to trunk should be backported to 0.18 branch. I 
am seeing these exceptions on our cluster:

> java.io.IOException: java.lang.OutOfMemoryError: Java heap space
> at java.io.DataInputStream.readFull(DataInputSteram.java:175)
> at 
> org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:64)
> at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:102)
> at org.apahce.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1933)
> at org.apahce.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1833)
> at org.apahce.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1879)
> at org.apache.hadoop.io.MapFile$Reader.next(MapFile.java:516)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.getNext(StoreFileScanner.java:312)

When such OOMEs as above happen, the cluster does not recover without manual 
intervention. The regionservers sometimes go down after this, or sometimes do 
not and stay up in sick condition for a while. Regions go offline and remain 
unavailable.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to