[
https://issues.apache.org/jira/browse/HBASE-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12990852#comment-12990852
]
stack commented on HBASE-3431:
------------------------------
If master can't find regionserver address, then master does this:
{code}
Caused by: java.lang.IllegalArgumentException: Could not resolve the DNS name
of sv2borg185:60020
at
org.apache.hadoop.hbase.HServerAddress.checkBindAddressCanBeResolved(HServerAddress.java:105)
at
org.apache.hadoop.hbase.HServerAddress.readFields(HServerAddress.java:168)
at org.apache.hadoop.hbase.HServerInfo.readFields(HServerInfo.java:230)
at
org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:521)
... 8 more
{code}
... which is kinda dumb but means no progress unless server can get an address.
If DNS is wrong, e.g. on master, when it does a lookup on passed name, we come
up w/ a different address, then we'll tell the regionserver go forward with the
IP.
At moment you'll see two entries for this badly configured server. The
regionserver will show by its name and by its bad IP.
Symptom is you can't shutdown because master is waiting on the ghost server to
finish its close up (this is what was happening for mr oracle.com).
I manufactured Ted's prob. by changing hosts on master to have different subnet
for a server. Then I got this in RS log:
{code}
2011-02-05 00:33:49,409 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Master passed us address to
use. Was=sv2borg185:60020, Now=10.20.20.185:60020
{code}
Let me dig in.
> Regionserver is not using the name given it by the master; double entry in
> master listing of servers
> ----------------------------------------------------------------------------------------------------
>
> Key: HBASE-3431
> URL: https://issues.apache.org/jira/browse/HBASE-3431
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.90.0
> Reporter: stack
> Assignee: stack
> Priority: Blocker
> Fix For: 0.90.1
>
> Attachments: 3431.txt
>
>
> Our man Ted Dunning found the following where RS checks in with one name, the
> master tells it use another name but we seem to go ahead and continue with
> our original name.
> In RS logs I see:
> {code}
> 2011-01-07 15:45:50,757 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer [regionserver60020]:
> Master passed us address to use. Was=perfnode11:60020, Now=10.10.30.11:60020
> {code}
> On master I see
> {code}
> 2011-01-07 15:45:38,613 INFO org.apache.hadoop.hbase.master.ServerManager
> [IPC Server handler 0 on 60000]: Registering
> server=10.10.30.11,60020,1294443935414, regionCount=0, userLoad=false
> {code}
> ....
> then later
> {code}
> 2011-01-07 15:45:44,247 INFO org.apache.hadoop.hbase.master.ServerManager
> [IPC Server handler 2 on 60000]: Registering
> server=perfnode11,60020,1294443935414, regionCount=0, userLoad=true
> {code}
> This might be since we started letting servers register in other than with
> the reportStartup.
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira