How do you query the number of regions loaded? I haven't looked into doing
that before? Following suggestions in another post, I tried scanning
".META." from the shell received the following response...
NativeException: org.apache.hadoop.hbase.client.NoServerForRegionException:
Timed out trying to locate root region
One of the region servers reports the following...
2009-01-29 15:20:11,789 ERROR
org.apache.hadoop.hbase.regionserver.HRegionServer: Failed openScanner
org.apache.hadoop.hbase.NotServingRegionException: .META.,,1
at
org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2065)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:1699)
at sun.reflect.GeneratedMethodAccessor21.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:895)
2009-01-29 15:20:11,790 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 8 on 60020, call openScanner([...@9ad56f, [...@1326bbf, [...@13fca3c,
9223372036854775807, null) from 192.168.6.38:48329: error:
org.apache.hadoop.hbase.NotServingRegionException: .META.,,1
Here are some stats on the cluster:
data nodes - 4
region servers - 4
I'm not sure what you mean by schema in this context. If you mean the table
definition, it has three column families. The columns are data driven.
On Thu, Jan 29, 2009 at 2:08 PM, stack <[email protected]> wrote:
> Check your datanode logs. You might get a clue. Any xceiver issues
> therein? You've upped your file descriptors limit? Tell us more about how
> many instances and how many regions you have loaded. Make mention of your
> schema too. Thanks Larry.
> St.Ack
>
> On Thu, Jan 29, 2009 at 10:18 AM, Larry Compton
> <[email protected]>wrote:
>
> > After a lengthy, but successful data ingestion run, I was running some
> > queries against my HBase table when one of my region servers ran out of
> > memory and became unresponsive. I shut down the HBase servers via
> > "stop-hbase.sh" and the one region server didn't terminate, so I killed
> it
> > via "kill" and then restarted the servers. Ever since I did that, when I
> > try
> > to access my table, the request stalls, eventually fails, and a number of
> > exceptions like the following appear in the log of one of region servers
> > (oddly enough, not the same one every time)...
> >
> > 2009-01-29 13:07:50,439 WARN org.apache.hadoop.hdfs.DFSClient: DFS Read:
> > java.io.IOException: Could not obtain block:
> blk_2439003473799601954_58348
> > file=/hbase/-ROOT-/70236052/info/mapfiles/2587717070724571438/data
> > at
> >
> >
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1708)
> > at
> >
> >
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1536)
> > at
> > org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1663)
> > at
> > org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1593)
> > at java.io.DataInputStream.readInt(DataInputStream.java:370)
> > at
> >
> >
> org.apache.hadoop.hbase.io.SequenceFile$Reader.readRecordLength(SequenceFile.java:1909)
> > at
> >
> org.apache.hadoop.hbase.io.SequenceFile$Reader.next(SequenceFile.java:1939)
> > at
> >
> org.apache.hadoop.hbase.io.SequenceFile$Reader.next(SequenceFile.java:1844)
> > at
> >
> org.apache.hadoop.hbase.io.SequenceFile$Reader.next(SequenceFile.java:1890)
> > at org.apache.hadoop.hbase.io.MapFile$Reader.next(MapFile.java:525)
> > at
> >
> >
> org.apache.hadoop.hbase.regionserver.HStore.rowAtOrBeforeFromMapFile(HStore.java:1714)
> > at
> >
> >
> org.apache.hadoop.hbase.regionserver.HStore.getRowKeyAtOrBefore(HStore.java:1686)
> > at
> >
> >
> org.apache.hadoop.hbase.regionserver.HRegion.getClosestRowBefore(HRegion.java:1088)
> > at
> >
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.getClosestRowBefore(HRegionServer.java:1548)
> > at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source)
> > at
> >
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> > at java.lang.reflect.Method.invoke(Method.java:597)
> > at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632)
> > at
> > org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:895)
> >
> > I ran fsck on HDFS and it's healthy. I'm guessing that something needed
> to
> > be flushed from the region server that I killed and now my table is in a
> > corrupt state. I have a couple of questions:
> >
> > - Is there a way to recover from this problem or do I need to rerun my
> > ingestion job?
> >
> > - When a region server runs out of memory, is there a better way to kill
> it
> > other than the "kill" command? I've been reading the postings related to
> > out
> > of memory errors and plan to try some of the suggestions. However, if it
> > does happen should I use one of the other scripts in the "bin" directory
> to
> > do a graceful shutdown?
> >
> > Hadoop 0.19.0
> > HBase 0.19.0
> >
> > Thanks
> > Larry Compton
> >
>
--
Larry Compton
SRA International
240.373.5312 (APL)
443.742.2762 (cell)