Would the 0.19.3 release downloaded from one of the mirrors be ok? Or do I really need a dev branch as suggested below?
Also. Once I do this, I'm suspecting the upgrade process will fail because HBase is in a bad state now (as described below). How can I repair that? Or maybe it would be easier to manually force removal of the bad HBase table? Maybe delete "/hbase/tableX" from HDFS, but I doubt that would work as .META. and -ROOT- would still be out of sync. What's the best way to do this? Marc ---------- Forwarded message ---------- From: Jonathan Gray <[email protected]> To: [email protected] Date: Mon, 17 Aug 2009 17:59:34 -0700 Subject: Re: NoServerForRegionException, TableNotFoundException and WrongRegionException To reiterate what stack said, you need to upgrade. There are serious, known bugs in 0.19.1. Upgrade to 0.19 branch or 0.20 branch, instructions can be found from this page: http://hadoop.apache.org/hbase/version_control.html For example, svn co http://svn.apache.org/repos/asf/hadoop/hbase/branches/0.19/hbase-0.19-branch cd hbase-0.19-branch ant jar JG Marc Limotte wrote: Regions seem to be reasonably dispersed... as of now... not sure if that was true before I reset hbase.hregion.max.filesize. Region Servers Address Start Code Load host1:60020 1250533702083 requests=0, regions=10, usedHeap=32, maxHeap=888 host2:60020 1250533702094 requests=0, regions=12, usedHeap=32, maxHeap=888 host3:60020 1250533702052 requests=0, regions=7, usedHeap=31, maxHeap=888 host4:60020 1250533702078 requests=0, regions=11, usedHeap=32, maxHeap=888 Total: servers: 4 requests=0, regions=40 Marc ---------- Forwarded message ---------- From: stack <[email protected]> To: [email protected] Date: Mon, 17 Aug 2009 14:21:47 -0700 Subject: Re: NoServerForRegionException, TableNotFoundException and WrongRegionException Please update to the head of 0.19 trunk, or better update to 0.20 trunk -- espeically if you are testing. Issues described below have been addressed. How many regions do you have in your table? Are all going to one regionserver because you only have one region? Yours, St.Ack On Mon, Aug 17, 2009 at 12:19 PM, Marc Limotte <[email protected]> wrote: I'm seeing a nice variety of Exceptions from HBase and could use some > pointers about what to do next. > > This is a new map/reduce program, updating about 550k rows with around a > dozen columns on a very small cluster (only 4 nodes... as we're still > testing and it doesn't have to support production yet). Hbase Version > 0.19.1. > > I ran the job and it seems to make some progress, and then dies after > several hours, reporting "NoServerForRegionException: No server address > listed in .META. for region TABLEX,,1250526695078". I retried it a few > times with the same result. I also noticed that the load is not well > balanced, all requests seemed to be going to one node. I adjust > hadoop-site.xml with the addition of these two entries: > > <name>hbase.hregion.max.filesize</name> > <value>33554432</value> > > <name>hbase.client.retries.number</name> > <value>5</value> > > And restarted hbase (and hadoop to be safe). Re-ran and got the same > error > in the M/R job. > > *I thought I'd try dropping the table, since it's a new table and I can > recreate it. But that gives another exception: > * > hbase(main):002:0> disable 'TABLEX' > NativeException: org.apache.hadoop.hbase.TableNotFoundException: > org.apache.hadoop.hbase.TableNotFoundException: TABLEX > at > > > org.apache.hadoop.hbase. > > master.TableOperation$ProcessTableOperation.call(TableOperation.java:129) > >> at >> >> >> org.apache.hadoop.hbase.master.TableOperation$ProcessTableOperation.call(TableOperation.java:70) > >> at >> >> >> org.apache.hadoop.hbase.master.RetryableMetaOperation.doWithRetries(RetryableMetaOperation.java:64) > >> at >> >> >> org.apache.hadoop.hbase.master.TableOperation.process(TableOperation.java:143) > >> at >> > org.apache.hadoop.hbase.master.HMaster.disableTable(HMaster.java:691) > >> ... >> >> >> *And now I see this exception in the HBase logs: >> * >> org.apache.hadoop.hbase.regionserver.WrongRegionException: >> org.apache.hadoop.hbase.regionserver.WrongRegionException: Requested row >> out >> of range for HRegion .META.,,1250280235390, startKey='', >> getEndKey()='TABLEX,,1250219949252', >> row='TABLEX,840.56098.0544,1250526661861' >> at >> org.apache.hadoop.hbase.regionserver.HRegion.checkRow(HRegion.java:1788) >> at >> >> >> org.apache.hadoop.hbase.regionserver.HRegion.obtainRowLock(HRegion.java:1844) > >> at >> org.apache.hadoop.hbase.regionserver.HRegion.getLock(HRegion.java:1912) >> at >> >> org.apache.hadoop.hbase.regionserver.HRegion.batchUpdate(HRegion.java:1244) > >> at >> >> org.apache.hadoop.hbase.regionserver.HRegion.batchUpdate(HRegion.java:1216) > >> ... >> >> >> *As a test, tried a "count"... >> * >> hbase(main):007:0* count 'TABLEX' >> NativeException: >> > org.apache.hadoop.hbase.client.NoServerForRegionException: > >> No server address listed in .META. for region TABLEX,,1250526695078 >> from org/apache/hadoop/hbase/client/HConnectionManager.java:548:in >> `locateRegionInMeta' >> from org/apache/hadoop/hbase/client/HConnectionManager.java:478:in >> `locateRegion' >> from org/apache/hadoop/hbase/client/HConnectionManager.java:440:in >> `locateRegion' >> from org/apache/hadoop/hbase/client/HTable.java:114:in `<init>' >> from org/apache/hadoop/hbase/client/HTable.java:97:in `<init>' >> from sun/reflect/NativeConstructorAccessorImpl.java:-2:in >> > `newInstance0' > >> ... >> >> >> *Also saw a thread somewhere that suggested doing a major compaction. >> > Did > >> that. It returns almost immediately. Not sure if that's normal or >> > not... > >> no perceivable impact from doing this, though.* >> >> hbase(main):013:0> major_compact '.META.' >> 0 row(s) in 0.0220 seconds >> hbase(main):014:0> >> >> Not sure what else to try? Is there a way to force removal of the table >> > in > >> question? Is there something else I should be looking at? >> >> Marc >> >>
