Check your datanode logs. You might get a clue. Any xceiver issues therein? You've upped your file descriptors limit? Tell us more about how many instances and how many regions you have loaded. Make mention of your schema too. Thanks Larry. St.Ack
On Thu, Jan 29, 2009 at 10:18 AM, Larry Compton <[email protected]>wrote: > After a lengthy, but successful data ingestion run, I was running some > queries against my HBase table when one of my region servers ran out of > memory and became unresponsive. I shut down the HBase servers via > "stop-hbase.sh" and the one region server didn't terminate, so I killed it > via "kill" and then restarted the servers. Ever since I did that, when I > try > to access my table, the request stalls, eventually fails, and a number of > exceptions like the following appear in the log of one of region servers > (oddly enough, not the same one every time)... > > 2009-01-29 13:07:50,439 WARN org.apache.hadoop.hdfs.DFSClient: DFS Read: > java.io.IOException: Could not obtain block: blk_2439003473799601954_58348 > file=/hbase/-ROOT-/70236052/info/mapfiles/2587717070724571438/data > at > > org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1708) > at > > org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1536) > at > org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1663) > at > org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1593) > at java.io.DataInputStream.readInt(DataInputStream.java:370) > at > > org.apache.hadoop.hbase.io.SequenceFile$Reader.readRecordLength(SequenceFile.java:1909) > at > org.apache.hadoop.hbase.io.SequenceFile$Reader.next(SequenceFile.java:1939) > at > org.apache.hadoop.hbase.io.SequenceFile$Reader.next(SequenceFile.java:1844) > at > org.apache.hadoop.hbase.io.SequenceFile$Reader.next(SequenceFile.java:1890) > at org.apache.hadoop.hbase.io.MapFile$Reader.next(MapFile.java:525) > at > > org.apache.hadoop.hbase.regionserver.HStore.rowAtOrBeforeFromMapFile(HStore.java:1714) > at > > org.apache.hadoop.hbase.regionserver.HStore.getRowKeyAtOrBefore(HStore.java:1686) > at > > org.apache.hadoop.hbase.regionserver.HRegion.getClosestRowBefore(HRegion.java:1088) > at > > org.apache.hadoop.hbase.regionserver.HRegionServer.getClosestRowBefore(HRegionServer.java:1548) > at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source) > at > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632) > at > org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:895) > > I ran fsck on HDFS and it's healthy. I'm guessing that something needed to > be flushed from the region server that I killed and now my table is in a > corrupt state. I have a couple of questions: > > - Is there a way to recover from this problem or do I need to rerun my > ingestion job? > > - When a region server runs out of memory, is there a better way to kill it > other than the "kill" command? I've been reading the postings related to > out > of memory errors and plan to try some of the suggestions. However, if it > does happen should I use one of the other scripts in the "bin" directory to > do a graceful shutdown? > > Hadoop 0.19.0 > HBase 0.19.0 > > Thanks > Larry Compton >
