I'm seeing a nice variety of Exceptions from HBase and could use some
pointers about what to do next.
This is a new map/reduce program, updating about 550k rows with around a
dozen columns on a very small cluster (only 4 nodes... as we're still
testing and it doesn't have to support production yet). Hbase Version
0.19.1.
I ran the job and it seems to make some progress, and then dies after
several hours, reporting "NoServerForRegionException: No server address
listed in .META. for region TABLEX,,1250526695078". I retried it a few
times with the same result. I also noticed that the load is not well
balanced, all requests seemed to be going to one node. I adjust
hadoop-site.xml with the addition of these two entries:
<name>hbase.hregion.max.filesize</name>
<value>33554432</value>
<name>hbase.client.retries.number</name>
<value>5</value>
And restarted hbase (and hadoop to be safe). Re-ran and got the same error
in the M/R job.
*I thought I'd try dropping the table, since it's a new table and I can
recreate it. But that gives another exception:
*
hbase(main):002:0> disable 'TABLEX'
NativeException: org.apache.hadoop.hbase.TableNotFoundException:
org.apache.hadoop.hbase.TableNotFoundException: TABLEX
at
org.apache.hadoop.hbase.master.TableOperation$ProcessTableOperation.call(TableOperation.java:129)
at
org.apache.hadoop.hbase.master.TableOperation$ProcessTableOperation.call(TableOperation.java:70)
at
org.apache.hadoop.hbase.master.RetryableMetaOperation.doWithRetries(RetryableMetaOperation.java:64)
at
org.apache.hadoop.hbase.master.TableOperation.process(TableOperation.java:143)
at org.apache.hadoop.hbase.master.HMaster.disableTable(HMaster.java:691)
...
*And now I see this exception in the HBase logs:
*
org.apache.hadoop.hbase.regionserver.WrongRegionException:
org.apache.hadoop.hbase.regionserver.WrongRegionException: Requested row out
of range for HRegion .META.,,1250280235390, startKey='',
getEndKey()='TABLEX,,1250219949252',
row='TABLEX,840.56098.0544,1250526661861'
at
org.apache.hadoop.hbase.regionserver.HRegion.checkRow(HRegion.java:1788)
at
org.apache.hadoop.hbase.regionserver.HRegion.obtainRowLock(HRegion.java:1844)
at
org.apache.hadoop.hbase.regionserver.HRegion.getLock(HRegion.java:1912)
at
org.apache.hadoop.hbase.regionserver.HRegion.batchUpdate(HRegion.java:1244)
at
org.apache.hadoop.hbase.regionserver.HRegion.batchUpdate(HRegion.java:1216)
...
*As a test, tried a "count"...
*
hbase(main):007:0* count 'TABLEX'
NativeException: org.apache.hadoop.hbase.client.NoServerForRegionException:
No server address listed in .META. for region TABLEX,,1250526695078
from org/apache/hadoop/hbase/client/HConnectionManager.java:548:in
`locateRegionInMeta'
from org/apache/hadoop/hbase/client/HConnectionManager.java:478:in
`locateRegion'
from org/apache/hadoop/hbase/client/HConnectionManager.java:440:in
`locateRegion'
from org/apache/hadoop/hbase/client/HTable.java:114:in `<init>'
from org/apache/hadoop/hbase/client/HTable.java:97:in `<init>'
from sun/reflect/NativeConstructorAccessorImpl.java:-2:in `newInstance0'
...
*Also saw a thread somewhere that suggested doing a major compaction. Did
that. It returns almost immediately. Not sure if that's normal or not...
no perceivable impact from doing this, though.*
hbase(main):013:0> major_compact '.META.'
0 row(s) in 0.0220 seconds
hbase(main):014:0>
Not sure what else to try? Is there a way to force removal of the table in
question? Is there something else I should be looking at?
Marc