Argument below for a new RC seems good to me. Let me put up a new RC, one that disables RS restart and that fixes the documentation issue found by Lei Wang. St.Ack
On Sat, Aug 1, 2009 at 12:02 PM, Andrew Purtell <apurt...@apache.org> wrote: > I don't have the log any more. I would have kept it if it was revealing. > There were messages about ZK session expiration, followed by a message > indicating the RS will restart, followed by warnings out of IPC as > clients were querying but the server was shutting down, followed by > thread stopping/terminated messages, and then nothing. I didn't see any > ERRORs related to potential problems with restarting... it just didn't > happen. > > I did switch my vote +1 for the RC because this is not a situation that > leads to data loss -- the master splits the log and reassigns regions > as expected -- but the messages about restarting in the RS log are > misleading if restart doesn't happen. Sounds like I'm not the only one > having this trouble. Could come up over and over on hbase-u...@. On > 1732 I suggest putting in the associated patch and making abort the > default behavior instead of restart until this can be sorted out. That > would require rolling a new RC. I haven't changed my vote but I do > recommend that to avoid confusion. Should put something up on the > troubleshooting page of the wiki in addition to or at least. > > - Andy > > > > > ________________________________ > From: stack <st...@duboce.net> > To: hbase-dev@hadoop.apache.org > Sent: Saturday, August 1, 2009 9:26:56 AM > Subject: Re: ANN: hbase 0.20.0 Release Candidate 1 available for download > > Yes, its supposed to restart itself and check in with the master as though > it a new server (as J-D notes). > > Do you have log from the incident? Open an issue if its broke? > > I see you flipped your vote from -1 to +1. I think we should fix this > failed restart but in the scheme of things, IMO, I don't think it a > showstopper sufficient to sink the RC. We can make a 0.20.1 to follow > close > on 0.20.0 with fixes for the likes of this and for the documentation issue > noted by Lei Wang up on the list. > > St.Ack > > > > On Fri, Jul 31, 2009 at 3:02 PM, Andrew Purtell <apurt...@apache.org> > wrote: > > > My understanding is the region server is supposed to restart and check in > > with the master as if newly launched. I could be wrong. I was away for a > > while. At least following the log messages this appears to be the intent. > > > > - Andy > > > > > > > > > > ________________________________ > > From: Ryan Rawson <ryano...@gmail.com> > > To: hbase-dev@hadoop.apache.org > > Sent: Friday, July 31, 2009 2:11:36 PM > > Subject: Re: ANN: hbase 0.20.0 Release Candidate 1 available for download > > > > It is not supposed to restart, you will need to use supervisor to > > achieve that. > > > > If the ZK session is timed out, then the RS has no idea if the master > > has reassigned regions or not. The RS then FATALS, the master > > recovers the log, and all will be well(ish). > > > > The zookeeper daemons also need supervising, since they might FATAL > > but can be restarted to continue on later. > > > > On Fri, Jul 31, 2009 at 2:07 PM, Andrew Purtell<apurt...@apache.org> > > wrote: > > > -1 > > > > > > Region server did not restart after ZK timeout. Entered a high stress > > period > > > while compacting under heavy write load, high RAM commitment, and > > > concurrency. > > > > > > This is a stress test and I need to tune down vm.swappiness some more, > > but > > > the region server shut down and did not restart. > > > > > > See attached. > > > > > > - Andy > > > > > > > > > ________________________________ > > > From: stack <st...@duboce.net> > > > To: hbase-dev@hadoop.apache.org; hbase-u...@hadoop.apache.org > > > Sent: Wednesday, July 29, 2009 5:31:31 PM > > > Subject: ANN: hbase 0.20.0 Release Candidate 1 available for download > > > > > > The first hbase 0.20.0 release candidate is available for download: > > > > > > http://people.apache.org/~stack/hbase-0.20.0-candidate-1/<http://people.apache.org/%7Estack/hbase-0.20.0-candidate-1/> > <http://people.apache.org/%7Estack/hbase-0.20.0-candidate-1/> > > <http://people.apache.org/%7Estack/hbase-0.19.0-candidate-1/> > > > > > > More than 400 issues have been addressed. The release notes are > > available > > > here: http://su.pr/18zcEO <http://tinyurl.com/8xmyx9>. > > > > > > HBase 0.20.0 runs on Hadoop 0.20.0. Alot has changed since 0.19.x > > including > > > configuration fundamentals. Be sure to read the 'Getting Started' > > > documentation available here: > > > http://su.pr/211OYP.< > > http://people.apache.org/%7Estack/hbase-0.19.0-candidate-1/> > > > > > > If you wish to bring your 0.19.x hbase data forward to 0.20.0, you will > > need > > > to run a migration. See > > http://wiki.apache.org/hadoop/Hbase/HowToMigrate. > > > First read the overview and then go to the section, 'From 0.19.x to > > 0.20.x'. > > > > > > Should we release this candidate as hbase 0.20.0? Please vote +1/-1 by > > > Monday August 3rd. > > > > > > Yours, > > > The HBasistas > > > > > > P.S. 0.20.0 Highlights include: > > > > > > + Much improved performance > > > + Master is no longer SPOF > > > + Rolling restarts -- no need to take down whole cluster updating > config. > > or > > > making minor upgrades > > > + A new, more comprehensive API (The old API is still present but > > > deprecated) > > > + Improved mapreduce connectors > > > + New contrib package with updated Transactional HBase (THBase) and > > Indexed > > > HBase (ITHBase) as well as a new REST gateway called stargate > > > + And, as they say on the radio, "much, much more". > > > > > > > > > > > > > > > > > > > > >