[
https://issues.apache.org/jira/browse/HBASE-1040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12652890#action_12652890
]
Andrew Purtell commented on HBASE-1040:
---------------------------------------
OOME last night did not take down the region server and it did not relinquish
its regions:
2008-12-03 10:03:51,625 INFO org.apache.hadoop.ipc.Server: IPC Server handler
21 on 60020, call next(325852455557500270, 30) from 10.30.94.53:51099: error:
java.io.IOException: java.lang.OutOfMemoryError: Java heap space
java.io.IOException: java.lang.OutOfMemoryError: Java heap space
at
org.apache.hadoop.hbase.io.ImmutableBytesWritable.readFields(ImmutableBytesWritable.java:110)
at
org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile.java:1754)
at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1882)
at org.apache.hadoop.io.MapFile$Reader.next(MapFile.java:516)
at
org.apache.hadoop.hbase.regionserver.StoreFileScanner.getNext(StoreFileScanner.java:312)
at
org.apache.hadoop.hbase.regionserver.StoreFileScanner.next(StoreFileScanner.java:183)
at
org.apache.hadoop.hbase.regionserver.HStoreScanner.next(HStoreScanner.java:226)
at
org.apache.hadoop.hbase.regionserver.HRegion$HScanner.next(HRegion.java:1920)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:1356)
at sun.reflect.GeneratedMethodAccessor23.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.hbase.ipc.HbaseRPC$Server.call(HbaseRPC.java:634)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:888)
2008-12-03 10:03:53,173 INFO org.apache.hadoop.ipc.Server: IPC Server handler
16 on 60020, call next(3850133095248684283, 30) from 10.30.94.53:51111: error:
java.io.IOException: java.lang.OutOfMemoryError: Java heap space
java.io.IOException: java.lang.OutOfMemoryError: Java heap space
at
org.apache.hadoop.hbase.io.ImmutableBytesWritable.readFields(ImmutableBytesWritable.java:110)
at
org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile.java:1754)
at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1882)
at org.apache.hadoop.io.MapFile$Reader.next(MapFile.java:516)
at
org.apache.hadoop.hbase.regionserver.StoreFileScanner.getNext(StoreFileScanner.java:312)
at
org.apache.hadoop.hbase.regionserver.StoreFileScanner.next(StoreFileScanner.java:183)
at
org.apache.hadoop.hbase.regionserver.HStoreScanner.next(HStoreScanner.java:226)
at
org.apache.hadoop.hbase.regionserver.HRegion$HScanner.next(HRegion.java:1920)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:1356)
at sun.reflect.GeneratedMethodAccessor23.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.hbase.ipc.HbaseRPC$Server.call(HbaseRPC.java:634)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:888)
2008-12-03 10:03:55,024 INFO org.apache.hadoop.ipc.Server: IPC Server handler
13 on 60020, call next(-8816589261293751564, 30) from 10.30.94.35:59972: error:
java.io.IOException: java.lang.OutOfMemoryError: GC overhead limit exceeded
java.io.IOException: java.lang.OutOfMemoryError: GC overhead limit exceeded
at sun.nio.ch.Util.releaseTemporaryDirectBuffer(Util.java:67)
at sun.nio.ch.IOUtil.read(IOUtil.java:212)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:236)
at
org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:55)
at
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:140)
at
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:150)
at
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:123)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:258)
at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
at java.io.DataInputStream.read(DataInputStream.java:132)
at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:100)
at
org.apache.hadoop.dfs.DFSClient$BlockReader.readChunk(DFSClient.java:1006)
at
org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:238)
at org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:190)
at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:159)
at org.apache.hadoop.dfs.DFSClient$BlockReader.read(DFSClient.java:859)
at
org.apache.hadoop.dfs.DFSClient$DFSInputStream.readBuffer(DFSClient.java:1394)
at
org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:1430)
at java.io.DataInputStream.readFully(DataInputStream.java:178)
at
org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:64)
at
org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:102)
at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1933)
at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1833)
at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1879)
at org.apache.hadoop.io.MapFile$Reader.next(MapFile.java:516)
at
org.apache.hadoop.hbase.regionserver.StoreFileScanner.getNext(StoreFileScanner.java:312)
at
org.apache.hadoop.hbase.regionserver.StoreFileScanner.next(StoreFileScanner.java:183)
at
org.apache.hadoop.hbase.regionserver.HStoreScanner.next(HStoreScanner.java:226)
at
org.apache.hadoop.hbase.regionserver.HRegion$HScanner.next(HRegion.java:1920)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:1356)
at sun.reflect.GeneratedMethodAccessor23.invoke(Unknown Source)
2008-12-03 10:03:57,082 INFO org.apache.hadoop.fs.FSInputChecker: Found
checksum error: org.apache.hadoop.fs.ChecksumException: Checksum error:
/blk_1503942726006789756:of:/data/hbase/content/38150535/content/mapfiles/1992009933541116621/data
at 3610624
at
org.apache.hadoop.fs.FSInputChecker.verifySum(FSInputChecker.java:277)
at
org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:242)
at org.apache.hadoop.fs.FSInputChecker.fill(FSInputChecker.java:177)
at org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:194)
at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:159)
at org.apache.hadoop.dfs.DFSClient$BlockReader.read(DFSClient.java:859)
at
org.apache.hadoop.dfs.DFSClient$DFSInputStream.readBuffer(DFSClient.java:1394)
at
org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:1430)
at
org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:1379)
at java.io.DataInputStream.readInt(DataInputStream.java:370)
at
org.apache.hadoop.io.SequenceFile$Reader.readRecordLength(SequenceFile.java:1898)
at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1928)
at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1833)
at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1879)
at org.apache.hadoop.io.MapFile$Reader.next(MapFile.java:516)
at
org.apache.hadoop.hbase.regionserver.StoreFileScanner.getNext(StoreFileScanner.java:312)
at
org.apache.hadoop.hbase.regionserver.StoreFileScanner.next(StoreFileScanner.java:183)
at
org.apache.hadoop.hbase.regionserver.HStoreScanner.next(HStoreScanner.java:226)
at
org.apache.hadoop.hbase.regionserver.HRegion$HScanner.next(HRegion.java:1920)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:1356)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.hbase.ipc.HbaseRPC$Server.call(HbaseRPC.java:634)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:888)
2008-12-03 10:03:57,083 WARN org.apache.hadoop.dfs.DFSClient: Found Checksum
error for blk_1503942726006789756_2211303 from 10.30.94.32:50010 at 3610624
2008-12-03 10:03:59,711 INFO org.apache.hadoop.ipc.Server: IPC Server handler 6
on 60020, call next(-8816589261293751564, 30) from 10.30.94.35:59972: error:
java.io.IOException: java.lang.OutOfMemoryError: Java heap space
java.io.IOException: java.lang.OutOfMemoryError: Java heap space
2008-12-03 10:04:02,486 INFO org.apache.hadoop.ipc.Server: IPC Server handler
19 on 60020, call next(-8816589261293751564, 30) from 10.30.94.35:59972: error:
java.io.IOException: java.lang.OutOfMemoryError: Java heap space
java.io.IOException: java.lang.OutOfMemoryError: Java heap space
2008-12-03 10:04:06,829 INFO org.apache.hadoop.ipc.Server: IPC Server handler 4
on 60020, call next(-8816589261293751564, 30) from 10.30.94.35:59972: error:
java.io.IOException: java.lang.OutOfMemoryError: Java heap space
java.io.IOException: java.lang.OutOfMemoryError: Java heap space
2008-12-03 10:04:12,044 INFO org.apache.hadoop.ipc.Server: IPC Server handler
18 on 60020, call next(-8816589261293751564, 30) from 10.30.94.35:59972: error:
java.io.IOException: java.lang.OutOfMemoryError: Java heap space
java.io.IOException: java.lang.OutOfMemoryError: Java heap space
2008-12-03 10:04:13,607 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner 2803606790257188079
lease expired
2008-12-03 10:04:16,873 INFO org.apache.hadoop.ipc.Server: IPC Server handler
29 on 60020, call next(-8816589261293751564, 30) from 10.30.94.35:59972: error:
java.io.IOException: java.lang.OutOfMemoryError: Java heap space
java.io.IOException: java.lang.OutOfMemoryError: Java heap space
2008-12-03 10:04:25,970 INFO org.apache.hadoop.ipc.Server: IPC Server handler
14 on 60020, call next(-8816589261293751564, 30) from 10.30.94.35:59972: error:
java.io.IOException: java.lang.OutOfMemoryError: Java heap space
java.io.IOException: java.lang.OutOfMemoryError: Java heap space
2008-12-03 10:04:34,702 INFO org.apache.hadoop.ipc.Server: IPC Server handler
21 on 60020, call next(-8816589261293751564, 30) from 10.30.94.35:59972: error:
java.io.IOException: java.lang.OutOfMemoryError: Java heap space
java.io.IOException: java.lang.OutOfMemoryError: Java heap space
2008-12-03 10:04:53,653 INFO org.apache.hadoop.ipc.Server: IPC Server handler
25 on 60020, call next(-8816589261293751564, 30) from 10.30.94.35:60076: error:
java.io.IOException: java.lang.OutOfMemoryError: Java heap space
java.io.IOException: java.lang.OutOfMemoryError: Java heap space
2008-12-03 10:05:05,487 INFO org.apache.hadoop.ipc.Server: IPC Server handler 9
on 60020, call next(3371752675632192545, 30) from 10.30.94.34:37462: error:
java.io.IOException: java.lang.OutOfMemoryError: GC overhead limit exceeded
java.io.IOException: java.lang.OutOfMemoryError: GC overhead limit exceeded
2008-12-03 10:05:25,685 INFO org.apache.hadoop.ipc.Server: IPC Server handler
14 on 60020, call next(-8816589261293751564, 30) from 10.30.94.35:60092: error:
java.io.IOException: read 218 bytes, should read 1666930
at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1842)
at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1879)
at org.apache.hadoop.io.MapFile$Reader.next(MapFile.java:516)
at
org.apache.hadoop.hbase.regionserver.StoreFileScanner.getNext(StoreFileScanner.java:312)
at
org.apache.hadoop.hbase.regionserver.StoreFileScanner.next(StoreFileScanner.java:183)
at
org.apache.hadoop.hbase.regionserver.HStoreScanner.next(HStoreScanner.java:226)
at
org.apache.hadoop.hbase.regionserver.HRegion$HScanner.next(HRegion.java:1920)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:1356)
at sun.reflect.GeneratedMethodAccessor43.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.hbase.ipc.HbaseRPC$Server.call(HbaseRPC.java:634)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:888)
2008-12-03 10:05:48,313 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner
-6882942580400942785 lease expired
2008-12-03 10:06:08,014 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner 325852455557500270
lease expired
2008-12-03 10:09:37,822 INFO org.apache.hadoop.hbase.regionserver.HLog: Closed
hdfs://sjdc-atr-dc-1.atr.trendmicro.com:50000/data/hbase/log_10.30.94.32_1228300601380_60020/hlog.dat.1228313220043,
entries=100001. New log writer:
/data/hbase/log_10.30.94.32_1228300601380_60020/hlog.dat.1228316977821
2008-12-03 10:20:11,460 INFO
org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Forced flushing of
content,d6551908ed66a9122f8ec39594c8d36e,1228218699826 because global memcache
limit of 536870912 exceeded; currenly 536909348 and flushing till 268435456
2008-12-03 10:20:16,973 WARN org.apache.hadoop.dfs.DFSClient: DataStreamer
Exception: java.lang.OutOfMemoryError: GC overhead limit exceeded
2008-12-03 10:20:16,974 WARN org.apache.hadoop.dfs.DFSClient: Error Recovery
for block blk_-4000942059672147735_2225305 bad datanode[0] 10.30.94.32:50010
2008-12-03 10:20:16,974 WARN org.apache.hadoop.dfs.DFSClient: Error Recovery
for block blk_-4000942059672147735_2225305 in pipeline 10.30.94.32:50010,
10.30.94.50:50010, 10.30.94.54:50010: bad datanode 10.30.94.32:50010
2008-12-03 10:20:39,106 INFO org.apache.hadoop.ipc.Server: IPC Server handler
12 on 60020, call batchUpdates([EMAIL PROTECTED],
[Lorg.apache.hadoop.hbase.io.BatchUpdate;@7f81f38f) from 10.30.94.4:52172:
error: java.io.IOException: java.lang.OutOfMemoryError: GC overhead limit
exceeded
java.io.IOException: java.lang.OutOfMemoryError: GC overhead limit exceeded
2008-12-03 10:20:48,269 WARN org.apache.hadoop.ipc.Server: Out of Memory in
server select
java.lang.OutOfMemoryError: GC overhead limit exceeded
2008-12-03 10:22:39,111 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner
-3591169216918124621 lease expired
2008-12-03 10:27:06,876 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner 7353895356043684043
lease expired
> OOME does not cause graceful shutdown under some failure scenarios
> ------------------------------------------------------------------
>
> Key: HBASE-1040
> URL: https://issues.apache.org/jira/browse/HBASE-1040
> Project: Hadoop HBase
> Issue Type: Bug
> Components: regionserver
> Affects Versions: 0.18.1
> Reporter: Andrew Purtell
>
> Probably OOME related updates to trunk should be backported to 0.18 branch. I
> am seeing these exceptions on our cluster in output from tablemap/tablereduce
> jobs:
> > java.io.IOException: java.lang.OutOfMemoryError: Java heap space
> > at java.io.DataInputStream.readFull(DataInputSteram.java:175)
> > at
> > org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:64)
> > at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:102)
> > at org.apahce.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1933)
> > at org.apahce.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1833)
> > at org.apahce.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1879)
> > at org.apache.hadoop.io.MapFile$Reader.next(MapFile.java:516)
> > at
> > org.apache.hadoop.hbase.regionserver.StoreFileScanner.getNext(StoreFileScanner.java:312)
> When such OOMEs as above happen, the cluster does not recover without manual
> intervention. The regionservers sometimes go down after this, or sometimes do
> not and stay up in sick condition for a while. Regions go offline and remain
> unavailable.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.