I do not if it was holding ROOT or META region. It looks like requests may fail in Hbase 0.18 if a region server stops.
Thanks, M. On Tue, Feb 24, 2009 at 5:40 PM, Jean-Daniel Cryans <[email protected]> wrote: > Well this should not happen like that. Was the region server holding the > ROOT or META region? If so, well that's a bug corrected in 0.19.0 and > branch-0.18. I suggest you upgrade to that version if you don't want to > break your MR jobs. > > J-D > > On Tue, Feb 24, 2009 at 10:33 AM, Michael Dagaev > <[email protected]>wrote: > >> What I see now is that the client gets an exception (see below) once a >> region servers stops: >> >> org.apache.hadoop.hbase.client.NoServerForRegionException: No server >> address listed in .META. >> ... >> Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException: >> Trying to contact region server <region server>:60020 for region >> >> I guess the exception occurred since the region server is down. Is it >> correct? >> >> Thank you for your cooperation, >> M. >> >> P. S. We are running version 0.18.1 >> >> On Tue, Feb 24, 2009 at 5:07 PM, Jean-Daniel Cryans <[email protected]> >> wrote: >> > Correcting myself, no waiting time regards the time to figure the node is >> > dead. It will still have to fetch the region location in META. >> > >> > J-D >> > >> > >> > On Tue, Feb 24, 2009 at 10:02 AM, Jean-Daniel Cryans < >> [email protected]>wrote: >> > >> >> Well if a region server dies instead of being cleanly shut down, it >> takes >> >> in the worst case 180 seconds (a region server lease length) before the >> >> Master reassigns the regions. Clients trying to connect to that server >> will >> >> take IIRC 10 seconds to figure the node is down then the time to >> communicate >> >> with ROOT and META is under 1 sec. If META wasn't updated yet, it will >> retry >> >> all of that. >> >> >> >> In the next release (0.20.0), the master is notified by Zookeeper in the >> >> following seconds of a region server death and will proceed to reassign >> the >> >> regions immediately. >> >> >> >> If the client don't have the region in cache and META is updated with >> the >> >> region server death, there will be no waiting time. >> >> >> >> J-D >> >> >> >> >> >> On Tue, Feb 24, 2009 at 9:49 AM, Michael Dagaev < >> [email protected]>wrote: >> >> >> >>> Thanks, now it is clear. >> >>> >> >>> However, if a region server is down, it takes a lot of time to retry >> >>> first, >> >>> to rescan the META region when the retries fail, rescan ROOT, etc. to >> >>> get eventually to another region server, which will handle the request. >> >>> Is it correct ? >> >>> >> >>> On Tue, Feb 24, 2009 at 4:36 PM, Jean-Daniel Cryans < >> [email protected]> >> >>> wrote: >> >>> > This is why we have a META table, it holds the location info. See >> >>> > http://wiki.apache.org/hadoop/Hbase/HbaseArchitecture#client >> >>> > >> >>> > J-D >> >>> > >> >>> > On Tue, Feb 24, 2009 at 9:28 AM, Michael Dagaev < >> >>> [email protected]>wrote: >> >>> > >> >>> >> Thanks, Jean-Daniel. >> >>> >> >> >>> >> I did run hbase-daemon stop regionserver and start regionserver >> >>> >> and saw the client retrying to connect to the restarted region >> server. >> >>> >> >> >>> >> How does it know to connect to another region server ? Maybe it >> stops >> >>> >> retrying, asks master, and get another region server to connect to. >> >>> >> Is it correct ? >> >>> >> >> >>> >> Thank you for your cooperation, >> >>> >> M. >> >>> >> >> >>> >> On Tue, Feb 24, 2009 at 3:56 PM, Jean-Daniel Cryans < >> >>> [email protected]> >> >>> >> wrote: >> >>> >> > Michael, >> >>> >> > >> >>> >> > Regards stopping those nodes, do it using >> hadoop-daemon/hbase-daemon >> >>> to >> >>> >> stop >> >>> >> > them cleanly. Requests from the clients will not "fail", they will >> >>> simply >> >>> >> be >> >>> >> > told to look elsewhere for the regions they have in cache. Unless >> you >> >>> >> only >> >>> >> > have 1 region server... >> >>> >> > >> >>> >> > Regards starting the nodes, apart from the usual >> >>> >> hadoop-daemon/hbase-daemon, >> >>> >> > no. >> >>> >> > >> >>> >> > J-D >> >>> >> > >> >>> >> > On Tue, Feb 24, 2009 at 8:50 AM, Michael Dagaev < >> >>> >> [email protected]>wrote: >> >>> >> > >> >>> >> >> Hi, all >> >>> >> >> >> >>> >> >> As I understand, I can stop a region server and a data node >> in a >> >>> >> >> cluster >> >>> >> >> "semi-transparently" for clients, i. e. the requests handled by >> the >> >>> >> >> region server >> >>> >> >> at that time will fail, but cluster will be working. >> >>> >> >> >> >>> >> >> If I start the data node and region server I do not have to do >> >>> anything >> >>> >> to >> >>> >> >> make >> >>> >> >> them work. >> >>> >> >> >> >>> >> >> Is it correct ? >> >>> >> >> >> >>> >> >> Thank you for your cooperation, >> >>> >> >> M. >> >>> >> >> >> >>> >> > >> >>> >> >> >>> > >> >>> >> >> >> >> >> > >> >
