[ 
https://issues.apache.org/jira/browse/HBASE-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12990852#comment-12990852
 ] 

stack commented on HBASE-3431:
------------------------------

If master can't find regionserver address, then master does this:

{code}
Caused by: java.lang.IllegalArgumentException: Could not resolve the DNS name 
of sv2borg185:60020
    at 
org.apache.hadoop.hbase.HServerAddress.checkBindAddressCanBeResolved(HServerAddress.java:105)
    at 
org.apache.hadoop.hbase.HServerAddress.readFields(HServerAddress.java:168)
    at org.apache.hadoop.hbase.HServerInfo.readFields(HServerInfo.java:230)
    at 
org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:521)
    ... 8 more
{code}

... which is kinda dumb but means no progress unless server can get an address.

If DNS is wrong, e.g. on master, when it does a lookup on passed name, we come 
up w/ a different address, then we'll tell the regionserver go forward with the 
IP.

At moment you'll see two entries for this badly configured server.  The 
regionserver will show by its name and by its bad IP.

Symptom is you can't shutdown because master is waiting on the ghost server to 
finish its close up (this is what was happening for mr oracle.com).

I manufactured Ted's prob. by changing hosts on master to have different subnet 
for a server.  Then I got this in RS log:

{code}
2011-02-05 00:33:49,409 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: Master passed us address to 
use. Was=sv2borg185:60020, Now=10.20.20.185:60020
{code}

Let me dig in.



> Regionserver is not using the name given it by the master; double entry in 
> master listing of servers
> ----------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3431
>                 URL: https://issues.apache.org/jira/browse/HBASE-3431
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.0
>            Reporter: stack
>            Assignee: stack
>            Priority: Blocker
>             Fix For: 0.90.1
>
>         Attachments: 3431.txt
>
>
> Our man Ted Dunning found the following where RS checks in with one name, the 
> master tells it use another name but we seem to go ahead and continue with 
> our original name.
> In RS logs I see:
> {code}
> 2011-01-07 15:45:50,757 INFO  
> org.apache.hadoop.hbase.regionserver.HRegionServer [regionserver60020]: 
> Master passed us address to use. Was=perfnode11:60020, Now=10.10.30.11:60020
> {code}
> On master I see
> {code}
> 2011-01-07 15:45:38,613 INFO  org.apache.hadoop.hbase.master.ServerManager 
> [IPC Server handler 0 on 60000]: Registering 
> server=10.10.30.11,60020,1294443935414, regionCount=0, userLoad=false
> {code}
> ....
> then later
> {code}
> 2011-01-07 15:45:44,247 INFO  org.apache.hadoop.hbase.master.ServerManager 
> [IPC Server handler 2 on 60000]: Registering 
> server=perfnode11,60020,1294443935414, regionCount=0, userLoad=true
> {code}
> This might be since we started letting servers register in other than with 
> the reportStartup.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to