Aha that stupid dot!
My /etc/hosts file looks pretty standard:
127.0.0.1 localhost
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts
However look what I found in the data-seed-specific hbase-site.xml
<property>
<name>hbase.master.dns.interface</name>
<value>lo</value>
</property>
<property>
<name>hbase.regionserver.dns.interface</name>
<value>lo</value>
</property>
Not sure why we had that in there originally but taking it out fixes the
problem. Both sides now resolve hregioninfo to "localhost" instead of
"localhost.". I have no idea how specifying the lo interface adds a period to
the localhost name but that sounds like a bug to me. Shall I report it or is
this a known issue?
Thanks for your help,
James Kennedy
Project Manager
Troove Inc.
On 2011-01-21, at 1:34 PM, Jean-Daniel Cryans wrote:
> There's some sort of mismatch:
>
> RegionServer ephemeral node deleted, processing expiration
> [localhost.,60020,1295592845214]
>
> and
>
> Waiting on regionserver(s) to go down localhost,60020,1295592845214
>
>
> Do you see the dot after "localhost" in the first line? I wonder how
> it got different in the znode and in ServerManager.onlineServers... In
> any case, I'm pretty sure you can get it working by playing with your
> /etc/hosts
>
> J-D
>
> On Thu, Jan 20, 2011 at 11:28 PM, James Kennedy
> <[email protected]> wrote:
>> I've come across a strange bug that I'm having trouble debugging.
>> Basically I have a seed application that is executed via maven and runs a
>> single JVM ApplicationStarter that starts up hdfs, regionserver, hmaster
>> threads. It does some seeding then shuts those down in reverse order.
>> So this isn't a typical way of running hbase to be sure. However it has
>> always worked until I upgraded to HBase 0.90.0.
>> I didn't notice it when I was originally testing 0.90.0 because it only
>> seems to be happening on our EC2.small build server node when I run this
>> particular seeder.
>> Running the same thing locally on my mac works.
>> Attached is the error output starting from when the HRegionServer.stop() is
>> called to when HMaster.shutdown() is called and it starts looping forever in
>> letRegionServersShutdown().
>> It looks like RegionServerTracker is getting to "RegionServer ephemeral node
>> deleted, processing expiration" but then because it can't get the
>> HServerInfo it doesn't follow-through with actually expiring it.
>> Does anyone have any ideas as to why this might be happening?
>>
>>
>> Thanks,
>> James Kennedy
>> Project Manager
>> Troove Inc.
>>
>>