I did stop and start HBase. The the "count" command seems automagically
working again (the counting not finished yet, but seems producing good
output). I don't think I did anything except:

1) Enable DEBUG per Stack's suggestion
2) After startup HBase, waited a little bit longer (since I was watching the
log file :-)

I didn't even see the old IP address appear in the log file, the only thing
caught my eyeball is this:

{{{
2009-02-27 23:46:29,158 INFO org.apache.hadoop.hbase.master.BaseScanner:
RegionManager.metaScanner scanning meta region {regionname: .META.,,1,
startKey: <>, server: 10.254.51.127:60020}
2009-02-27 23:46:29,191 DEBUG org.apache.hadoop.hbase.master.BaseScanner:
Current assignment of 1001_profiles,,1235713972403 is not valid;
serverInfo: address: 10.254.51.127:60020, startcode: 1235796330999, load:
(requests=0, regions=2, usedHeap=29, maxHeap=888), passed startCode:
1235789499358, storedInfo.startCode: 1235796330999
2009-02-27 23:46:29,194 DEBUG org.apache.hadoop.hbase.master.BaseScanner:
Current assignment of 1001_profiles,113161088459795286,1235713972403 is not
valid;  serverInfo: address: 10.254.51.127:60020, startcode: 1235796330999,
load: (requests=0, regions=2, usedHeap=29, maxHeap=888), passed startCode:
1235789499358, storedInfo.startCode: 1235796330999
}}}

Is this meaning some blocks are invalid, and we need to wait for a while
until they can be properly replicated, then everything will be good again?

Thanks,
Yan

2009/2/28 stack <[email protected]>

> Stop and start hbase.
>
> Watch the master log as it starts up.
>
> Try to figure why it is not judging regions that have the old server IPs as
> bad.
>
> Enable DEBUG before you restart.  The extra info might help (see FAQ on
> wiki
> for how).
>
> St.Ack
>
> On Fri, Feb 27, 2009 at 8:00 PM, Liu Yan <[email protected]> wrote:
>
> > When I do "scan '.META.'", I see some interesting output:
> >
> > {{{
> >  1002_profiles,7139226398444 column=info:server, timestamp=1235789710023,
> > value=10.254.51.127:60020
> >  3021,1235657605714
> >
> >  1002_profiles,7139226398444 column=info:serverstartcode,
> > timestamp=1235789710023, value=1235789499358
> >  3021,1235657605714
> >
> >  1002_profiles,7399192338534 column=historian:assignment,
> > timestamp=1235789558647, value=Region assigned to se
> >  9818,1235657605714          rver 10.254.51.127:60020
> >
> >  1002_profiles,7399192338534 column=historian:open,
> > timestamp=1235789577850,
> > value=Region opened on server : h
> >  9818,1235657605714          master
> > }}}
> >
> > The IP address here is correct, pointing to the new master's IP.
> >
> > But I also see the following:
> >
> > {{{
> >  1002_profiles,7399192338534 column=info:server, timestamp=1235789577848,
> > value=10.254.51.127:60020
> >  9818,1235657605714
> >
> >  1002_profiles,7399192338534 column=info:serverstartcode,
> > timestamp=1235789577848, value=1235789499358
> >  9818,1235657605714
> >
> >  1002_profiles,7572817158818 column=historian:assignment,
> > timestamp=1235297600858, value=Region assigned to se
> >  3981,1235242656324          rver 10.249.190.85:60020
> >
> >  1002_profiles,7572817158818 column=historian:open,
> > timestamp=1235297623082,
> > value=Region opened on server : h
> >  3981,1235242656324          master
> > }}}
> >
> > This is the IP of our old master's.
> >
> > How to fix this?
> >
> > Regards,
> > Yan
> >
> > 2009/2/28 stack <[email protected]>
> >
> > > If scan is working, do 'scan ".META."'.
> > >
> > > There are three columns: info:regioninfo, info:serverstartcode, and
> > > info:server.
> > >
> > > What do you see for info:server?  New addresses or the old?
> > >
> > > On startup, hbase should be judging the content of .META. as sour and
> > > reassigning regions to the servers that have just registered; i.e.
> those
> > of
> > > the new addresses.
> > >
> > > St.Ack
> > >
> > >
> > > On Fri, Feb 27, 2009 at 7:15 PM, Liu Yan <[email protected]>
> wrote:
> > >
> > > > hi,
> > > >
> > > > We have a 4-node cluster Hadoop 0.19.0 and HBase 0.19.0. We run
> > NameNode
> > > > and
> > > > RegionServer on the same server and created a bunch of tables on
> HBase.
> > > >
> > > > Now we want to use another (more powerful) machine to replace the old
> > > > master. Here is what we did:
> > > >
> > > > 1) Shutdown HBase and Hadoop
> > > > 2) Copy all the Hadoop related files from the old master to the new
> > > master.
> > > > 3) Re-configure the Hadoop and HBase so all (including the master and
> > > > clients) are now pointing to the new master.
> > > > 4) Start the Hadoop cluster. (This seems fine).
> > > > 5) Start the HBase cluster. (This seems fine too).
> > > >
> > > > Then when we try to do a "count" in HBase shell, (e.g. count
> > > 'table_name'),
> > > > we hit the following problem:
> > > >
> > > > 09/02/27 21:53:04 INFO ipc.HBaseClass: Retrying connect to server: /
> > > > 10.249.190.85:60020. Already tried 0 time(s).
> > > > 09/02/27 21:53:05 INFO ipc.HBaseClass: Retrying connect to server: /
> > > > 10.249.190.85:60020. Already tried 1 time(s).
> > > > 09/02/27 21:53:06 INFO ipc.HBaseClass: Retrying connect to server: /
> > > > 10.249.190.85:60020. Already tried 2 time(s).
> > > > 09/02/27 21:53:06 INFO ipc.HbaseRPC: Server at /10.249.190.85:60020
> not
> > > > available yet, Zzzzz...
> > > > 09/02/27 21:53:06 INFO ipc.HbaseRPC: Server at /10.249.190.85:60020
> > could
> > > > not be reached after 1 tries, giving up.
> > > > 09/02/27 21:53:09 INFO ipc.HBaseClass: Retrying connect to server: /
> > > > 10.249.190.85:60020. Already tried 0 time(s).
> > > > 09/02/27 21:53:10 INFO ipc.HBaseClass: Retrying connect to server: /
> > > > 10.249.190.85:60020. Already tried 1 time(s).
> > > > 09/02/27 21:53:11 INFO ipc.HBaseClass: Retrying connect to server: /
> > > > 10.249.190.85:60020. Already tried 2 time(s).
> > > >
> > > > The IP address showing here is actually the old master's IP address
> > > instead
> > > > of the new one's.
> > > >
> > > > We tried "list" and "scan" commands in the HBase shell, both of them
> > are
> > > > working good. Just the "count" reported the above error.
> > > >
> > > > What's the problem here?
> > > >
> > > > Thanks,
> > > > Yan
> > > >
> > >
> >
>

Reply via email to