No problem :)
On Tue, Feb 24, 2009 at 6:30 PM, Jean-Daniel Cryans <jdcry...@apache.org> wrote: > Ok so that region server must have been holding .META., you will have to > restart HBase. > > Sorry > > J-D > > On Tue, Feb 24, 2009 at 11:27 AM, Michael Dagaev > <michael.dag...@gmail.com>wrote: > >> Sorry, I mean that some requests fail when a region server is down in >> Hbase 0.18.1, >> which we are using now. >> >> Besides, when I started the stopped region server and stopped another one, >> not only "old" requests were stuck because of retries but new requests >> (e.g. >> issued by hbase shell) fail too. >> >> The master.jsp also fails with >> >> Trying to contact region server <...>:60020 for region .META.,,1, row >> '', but failed after 10 attempts. >> Exceptions: java.io.IOException: Call failed on local exception >> >> Thank you for your cooperation, >> M. >> >> On Tue, Feb 24, 2009 at 6:06 PM, Jean-Daniel Cryans <jdcry...@apache.org> >> wrote: >> > As I wrote, you should upgrade to 0.18 branch in SVN. >> > >> > J-D >> > >> > On Tue, Feb 24, 2009 at 11:04 AM, Michael Dagaev >> > <michael.dag...@gmail.com>wrote: >> > >> >> I do not if it was holding ROOT or META region. >> >> It looks like requests may fail in Hbase 0.18 if a region server stops. >> >> >> >> Thanks, >> >> M. >> >> >> >> On Tue, Feb 24, 2009 at 5:40 PM, Jean-Daniel Cryans < >> jdcry...@apache.org> >> >> wrote: >> >> > Well this should not happen like that. Was the region server holding >> the >> >> > ROOT or META region? If so, well that's a bug corrected in 0.19.0 and >> >> > branch-0.18. I suggest you upgrade to that version if you don't want >> to >> >> > break your MR jobs. >> >> > >> >> > J-D >> >> > >> >> > On Tue, Feb 24, 2009 at 10:33 AM, Michael Dagaev >> >> > <michael.dag...@gmail.com>wrote: >> >> > >> >> >> What I see now is that the client gets an exception (see below) once >> a >> >> >> region servers stops: >> >> >> >> >> >> org.apache.hadoop.hbase.client.NoServerForRegionException: No server >> >> >> address listed in .META. >> >> >> ... >> >> >> Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException: >> >> >> Trying to contact region server <region server>:60020 for region >> >> >> >> >> >> I guess the exception occurred since the region server is down. Is it >> >> >> correct? >> >> >> >> >> >> Thank you for your cooperation, >> >> >> M. >> >> >> >> >> >> P. S. We are running version 0.18.1 >> >> >> >> >> >> On Tue, Feb 24, 2009 at 5:07 PM, Jean-Daniel Cryans < >> >> jdcry...@apache.org> >> >> >> wrote: >> >> >> > Correcting myself, no waiting time regards the time to figure the >> node >> >> is >> >> >> > dead. It will still have to fetch the region location in META. >> >> >> > >> >> >> > J-D >> >> >> > >> >> >> > >> >> >> > On Tue, Feb 24, 2009 at 10:02 AM, Jean-Daniel Cryans < >> >> >> jdcry...@apache.org>wrote: >> >> >> > >> >> >> >> Well if a region server dies instead of being cleanly shut down, >> it >> >> >> takes >> >> >> >> in the worst case 180 seconds (a region server lease length) >> before >> >> the >> >> >> >> Master reassigns the regions. Clients trying to connect to that >> >> server >> >> >> will >> >> >> >> take IIRC 10 seconds to figure the node is down then the time to >> >> >> communicate >> >> >> >> with ROOT and META is under 1 sec. If META wasn't updated yet, it >> >> will >> >> >> retry >> >> >> >> all of that. >> >> >> >> >> >> >> >> In the next release (0.20.0), the master is notified by Zookeeper >> in >> >> the >> >> >> >> following seconds of a region server death and will proceed to >> >> reassign >> >> >> the >> >> >> >> regions immediately. >> >> >> >> >> >> >> >> If the client don't have the region in cache and META is updated >> with >> >> >> the >> >> >> >> region server death, there will be no waiting time. >> >> >> >> >> >> >> >> J-D >> >> >> >> >> >> >> >> >> >> >> >> On Tue, Feb 24, 2009 at 9:49 AM, Michael Dagaev < >> >> >> michael.dag...@gmail.com>wrote: >> >> >> >> >> >> >> >>> Thanks, now it is clear. >> >> >> >>> >> >> >> >>> However, if a region server is down, it takes a lot of time to >> retry >> >> >> >>> first, >> >> >> >>> to rescan the META region when the retries fail, rescan ROOT, >> etc. >> >> to >> >> >> >>> get eventually to another region server, which will handle the >> >> request. >> >> >> >>> Is it correct ? >> >> >> >>> >> >> >> >>> On Tue, Feb 24, 2009 at 4:36 PM, Jean-Daniel Cryans < >> >> >> jdcry...@apache.org> >> >> >> >>> wrote: >> >> >> >>> > This is why we have a META table, it holds the location info. >> See >> >> >> >>> > http://wiki.apache.org/hadoop/Hbase/HbaseArchitecture#client >> >> >> >>> > >> >> >> >>> > J-D >> >> >> >>> > >> >> >> >>> > On Tue, Feb 24, 2009 at 9:28 AM, Michael Dagaev < >> >> >> >>> michael.dag...@gmail.com>wrote: >> >> >> >>> > >> >> >> >>> >> Thanks, Jean-Daniel. >> >> >> >>> >> >> >> >> >>> >> I did run hbase-daemon stop regionserver and start >> regionserver >> >> >> >>> >> and saw the client retrying to connect to the restarted region >> >> >> server. >> >> >> >>> >> >> >> >> >>> >> How does it know to connect to another region server ? Maybe >> it >> >> >> stops >> >> >> >>> >> retrying, asks master, and get another region server to >> connect >> >> to. >> >> >> >>> >> Is it correct ? >> >> >> >>> >> >> >> >> >>> >> Thank you for your cooperation, >> >> >> >>> >> M. >> >> >> >>> >> >> >> >> >>> >> On Tue, Feb 24, 2009 at 3:56 PM, Jean-Daniel Cryans < >> >> >> >>> jdcry...@apache.org> >> >> >> >>> >> wrote: >> >> >> >>> >> > Michael, >> >> >> >>> >> > >> >> >> >>> >> > Regards stopping those nodes, do it using >> >> >> hadoop-daemon/hbase-daemon >> >> >> >>> to >> >> >> >>> >> stop >> >> >> >>> >> > them cleanly. Requests from the clients will not "fail", >> they >> >> will >> >> >> >>> simply >> >> >> >>> >> be >> >> >> >>> >> > told to look elsewhere for the regions they have in cache. >> >> Unless >> >> >> you >> >> >> >>> >> only >> >> >> >>> >> > have 1 region server... >> >> >> >>> >> > >> >> >> >>> >> > Regards starting the nodes, apart from the usual >> >> >> >>> >> hadoop-daemon/hbase-daemon, >> >> >> >>> >> > no. >> >> >> >>> >> > >> >> >> >>> >> > J-D >> >> >> >>> >> > >> >> >> >>> >> > On Tue, Feb 24, 2009 at 8:50 AM, Michael Dagaev < >> >> >> >>> >> michael.dag...@gmail.com>wrote: >> >> >> >>> >> > >> >> >> >>> >> >> Hi, all >> >> >> >>> >> >> >> >> >> >>> >> >> As I understand, I can stop a region server and a data >> >> node >> >> >> in a >> >> >> >>> >> >> cluster >> >> >> >>> >> >> "semi-transparently" for clients, i. e. the requests >> handled >> >> by >> >> >> the >> >> >> >>> >> >> region server >> >> >> >>> >> >> at that time will fail, but cluster will be working. >> >> >> >>> >> >> >> >> >> >>> >> >> If I start the data node and region server I do not have >> to >> >> do >> >> >> >>> anything >> >> >> >>> >> to >> >> >> >>> >> >> make >> >> >> >>> >> >> them work. >> >> >> >>> >> >> >> >> >> >>> >> >> Is it correct ? >> >> >> >>> >> >> >> >> >> >>> >> >> Thank you for your cooperation, >> >> >> >>> >> >> M. >> >> >> >>> >> >> >> >> >> >>> >> > >> >> >> >>> >> >> >> >> >>> > >> >> >> >>> >> >> >> >> >> >> >> >> >> >> >> > >> >> >> >> >> > >> >> >> > >> >