[
https://issues.apache.org/jira/browse/SLIDER-1259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gour Saha reassigned SLIDER-1259:
---------------------------------
Assignee: Steve Loughran
> Slider does not work in multi homed environments
> ------------------------------------------------
>
> Key: SLIDER-1259
> URL: https://issues.apache.org/jira/browse/SLIDER-1259
> Project: Slider
> Issue Type: Bug
> Components: appmaster
> Affects Versions: Slider 0.92
> Reporter: Lev Bronshtein
> Assignee: Steve Loughran
> Priority: Minor
>
> In an an environment where Hadoop Worker nodes bind the Node Manager to an
> interface with a hostname different from the one returned by socket.getfqdn()
> for example in our test environment a difference between f-bcpc-vm3 and just
> bcpc-vm3, which is the hostname bound to the management interface, but not
> the interface for hadoop/production traffic. This results in our inability
> to introspect running jobs.
>
> For example running *slider registry --name slider_poc --listexp* results in
> the following output in the ResourceManager logs
> {quote}2018-01-26 17:30:32,147 INFO
> org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet: ubuntu is
> accessing unchecked
> [http://bcpc-vm3.bcpc.example.com:46391/ws/v1/slider/publisher/exports] which
> is the app master GUI of application_1516910361403_0094 owned by ubuntu
> 2018-01-26 17:31:13,639 WARN org.mortbay.log:
> /proxy/application_1516910361403_0094/ws/v1/slider/publisher/exports:
> java.net.ConnectException: Connection timed out (Connection timed out)
> {quote}
>
> Note how the redirect is to
> [http://bcpc-vm3.bcpc.example.com:46391/ws/v1/slider/publisher/exports,]
> where as it should have been to
> [http://f-bcpc-vm3.bcpc.example.com:46391/ws/v1/slider/publisher/exports.]
> Renaming the host to f-bcpc-vm3 results in appropriate behavior.
>
> perhaps *hostname.py* can be instructed to look at one of before registering
> *yarn.nodemanager.address*
> *yarn.nodemanager.bind-host*
> *yarn.nodemanager.hostname*
>
> When called in Register.py
> register = {'responseId': int(id),
> 'timestamp': timestamp,
> 'label': self.config.getLabel(),
> *'publicHostname': hostname.public_hostname(),*
> 'agentVersion': version,
> 'actualState': actualState,
> 'expectedState': expectedState,
> 'allocatedPorts': allocated_ports,
> 'logFolders': log_folders,
> 'tags': tags
> }
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)