[ 
https://issues.apache.org/jira/browse/HBASE-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-3431:
-------------------------

    Attachment: 3431-v2.txt

The issue is that if the master sees a RegionServer differently to how the RS 
sees itself -- e.g. master gets an ip when it does lookup though RS passed a 
name or if RS passed a FQDN but master has hostname only -- then the master 
will ask the RS to take on the name the Master sees by passing it back an 
HServerAddress.  This does not work if the two servers are getting different 
answers from their respective DNS's.  The Master knows RS's by their 
'ServerName' which is hostname+port+startcode.  If DNS is wonky, then the 
Master and RS will come up with different 'ServerName's even if the Master 
passes back its HSA (HSA could be IP only, RS does lookup and comes up w/ 
different hostname if DNS is broke).  This patch removes the code that has 
master trying the RS the identity to use.  Instead Master just uses the 
ServerName the RS volunteered.

So far in testing it seems to work when DNS is set up properly and when Master 
side DNS is broke where its finding IP only for RS.  Let me do some more 
testing.

> Regionserver is not using the name given it by the master; double entry in 
> master listing of servers
> ----------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3431
>                 URL: https://issues.apache.org/jira/browse/HBASE-3431
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.0
>            Reporter: stack
>            Assignee: stack
>            Priority: Blocker
>             Fix For: 0.90.1
>
>         Attachments: 3431-v2.txt, 3431.txt
>
>
> Our man Ted Dunning found the following where RS checks in with one name, the 
> master tells it use another name but we seem to go ahead and continue with 
> our original name.
> In RS logs I see:
> {code}
> 2011-01-07 15:45:50,757 INFO  
> org.apache.hadoop.hbase.regionserver.HRegionServer [regionserver60020]: 
> Master passed us address to use. Was=perfnode11:60020, Now=10.10.30.11:60020
> {code}
> On master I see
> {code}
> 2011-01-07 15:45:38,613 INFO  org.apache.hadoop.hbase.master.ServerManager 
> [IPC Server handler 0 on 60000]: Registering 
> server=10.10.30.11,60020,1294443935414, regionCount=0, userLoad=false
> {code}
> ....
> then later
> {code}
> 2011-01-07 15:45:44,247 INFO  org.apache.hadoop.hbase.master.ServerManager 
> [IPC Server handler 2 on 60000]: Registering 
> server=perfnode11,60020,1294443935414, regionCount=0, userLoad=true
> {code}
> This might be since we started letting servers register in other than with 
> the reportStartup.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to