Andrew is right that a RS is supposed to restart but it's not always working. The real problem, as it is most of the time, is GC pauses else there would be no ZK timeouts.
J-D On Fri, Jul 31, 2009 at 6:05 PM, Andrew Purtell<apurt...@apache.org> wrote: > Ok, then I vote +1 on the RC, but with the caveat that "restarting" should > be stricken from HRS logging. > > - Andy > > > > > ________________________________ > From: Ryan Rawson <ryano...@gmail.com> > To: hbase-dev@hadoop.apache.org > Sent: Friday, July 31, 2009 3:03:32 PM > Subject: Re: ANN: hbase 0.20.0 Release Candidate 1 available for download > > The JVM must terminate because it is difficult to reset the state of a > HRS. Thus supervision is necessary. > > On Fri, Jul 31, 2009 at 3:02 PM, Andrew Purtell<apurt...@apache.org> wrote: >> My understanding is the region server is supposed to restart and check in >> with the master as if newly launched. I could be wrong. I was away for a >> while. At least following the log messages this appears to be the intent. >> >> - Andy >> >> >> >> >> ________________________________ >> From: Ryan Rawson <ryano...@gmail.com> >> To: hbase-dev@hadoop.apache.org >> Sent: Friday, July 31, 2009 2:11:36 PM >> Subject: Re: ANN: hbase 0.20.0 Release Candidate 1 available for download >> >> It is not supposed to restart, you will need to use supervisor to >> achieve that. >> >> If the ZK session is timed out, then the RS has no idea if the master >> has reassigned regions or not. The RS then FATALS, the master >> recovers the log, and all will be well(ish). >> >> The zookeeper daemons also need supervising, since they might FATAL >> but can be restarted to continue on later. >> >> On Fri, Jul 31, 2009 at 2:07 PM, Andrew Purtell<apurt...@apache.org> wrote: >>> -1 >>> >>> Region server did not restart after ZK timeout. Entered a high stress period >>> while compacting under heavy write load, high RAM commitment, and >>> concurrency. >>> >>> This is a stress test and I need to tune down vm.swappiness some more, but >>> the region server shut down and did not restart. >>> >>> See attached. >>> >>> - Andy >>> >>> >>> ________________________________ >>> From: stack <st...@duboce.net> >>> To: hbase-dev@hadoop.apache.org; hbase-u...@hadoop.apache.org >>> Sent: Wednesday, July 29, 2009 5:31:31 PM >>> Subject: ANN: hbase 0.20.0 Release Candidate 1 available for download >>> >>> The first hbase 0.20.0 release candidate is available for download: >>> >>> http://people.apache.org/~stack/hbase-0.20.0-candidate-1/<http://people.apache.org/%7Estack/hbase-0.19.0-candidate-1/> >>> >>> More than 400 issues have been addressed. The release notes are available >>> here: http://su.pr/18zcEO <http://tinyurl.com/8xmyx9>. >>> >>> HBase 0.20.0 runs on Hadoop 0.20.0. Alot has changed since 0.19.x including >>> configuration fundamentals. Be sure to read the 'Getting Started' >>> documentation available here: >>> http://su.pr/211OYP.<http://people.apache.org/%7Estack/hbase-0.19.0-candidate-1/> >>> >>> If you wish to bring your 0.19.x hbase data forward to 0.20.0, you will need >>> to run a migration. See http://wiki.apache.org/hadoop/Hbase/HowToMigrate. >>> First read the overview and then go to the section, 'From 0.19.x to 0.20.x'. >>> >>> Should we release this candidate as hbase 0.20.0? Please vote +1/-1 by >>> Monday August 3rd. >>> >>> Yours, >>> The HBasistas >>> >>> P.S. 0.20.0 Highlights include: >>> >>> + Much improved performance >>> + Master is no longer SPOF >>> + Rolling restarts -- no need to take down whole cluster updating config. or >>> making minor upgrades >>> + A new, more comprehensive API (The old API is still present but >>> deprecated) >>> + Improved mapreduce connectors >>> + New contrib package with updated Transactional HBase (THBase) and Indexed >>> HBase (ITHBase) as well as a new REST gateway called stargate >>> + And, as they say on the radio, "much, much more". >>> >>> >> >> >> >> > > > >