[
https://issues.apache.org/jira/browse/FLINK-2821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15848287#comment-15848287
]
Philipp von dem Bussche commented on FLINK-2821:
------------------------------------------------
Hello [~mxm], after being quiet for a while I wanted to feed back on the setup
I am running at the moment.
To recap (I had to think about my setup myself again after not spending much
time on it lately ;) ):
- job manager and task manager run in Docker containers
- I am using an orchestration engine called Rancher on top of docker which also
introduces another set of IP addresses / network on top of Docker.
Since I am communicating to the JobManager from within the Docker / Rancher
network as well as from outside (from my local buildserver) I had to have the
JobManager register to a hostname that is resolvable on the Internet. Both the
task manager (coming from within the Docker / Rancher network) as well as the
build server connect via the internet host name now. Obviously since the task
manager would live right next to the job manager the preferred solution would
be for the task manager to connect locally (meaning through the Docker /
Rancher network) but since one can only specify one listener address it has to
go through the internet host name.
However this does not solve the problem completly yet because if I just tell
the JobManager to bind to the internet host name I am getting the following
exception while JobManager starts up:
017-02-01 11:13:51,997 INFO org.apache.flink.util.NetUtils
- Unable to allocate on port 6123, due to error: Address not
available (Bind failed)
2017-02-01 11:13:51,999 ERROR org.apache.flink.runtime.jobmanager.JobManager
- Failed to run JobManager.
java.lang.RuntimeException: Unable to do further retries starting the actor
system
at
org.apache.flink.runtime.jobmanager.JobManager$.retryOnBindException(JobManager.scala:2136)
at
org.apache.flink.runtime.jobmanager.JobManager$.runJobManager(JobManager.scala:2076)
at
org.apache.flink.runtime.jobmanager.JobManager$$anon$12.call(JobManager.scala:1971)
at
org.apache.flink.runtime.jobmanager.JobManager$$anon$12.call(JobManager.scala:1969)
at
org.apache.flink.runtime.security.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:29)
at
org.apache.flink.runtime.jobmanager.JobManager$.main(JobManager.scala:1969)
at org.apache.flink.runtime.jobmanager.JobManager.main(JobManager.scala)
So additionally I had to put the Docker IP address of the JobManager container
into /etc/hosts resolving to the internet host name so that it tries to bind on
the Docker IP address rather than the Amazon AWS IP address (which is the IP
that the internet host name resolves to).
This works for me now, I would not call it ideal though.
I have to admit I have not tested this with the latest RC, will do that later
in the week.
Thanks
> Change Akka configuration to allow accessing actors from different URLs
> -----------------------------------------------------------------------
>
> Key: FLINK-2821
> URL: https://issues.apache.org/jira/browse/FLINK-2821
> Project: Flink
> Issue Type: Bug
> Components: Distributed Coordination
> Reporter: Robert Metzger
> Assignee: Maximilian Michels
> Fix For: 1.2.0
>
>
> Akka expects the actor's URL to be exactly matching.
> As pointed out here, cases where users were complaining about this:
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Error-trying-to-access-JM-through-proxy-td3018.html
> - Proxy routing (as described here, send to the proxy URL, receiver
> recognizes only original URL)
> - Using hostname / IP interchangeably does not work (we solved this by
> always putting IP addresses into URLs, never hostnames)
> - Binding to multiple interfaces (any local 0.0.0.0) does not work. Still
> no solution to that (but seems not too much of a restriction)
> I am aware that this is not possible due to Akka, so it is actually not a
> Flink bug. But I think we should track the resolution of the issue here
> anyways because its affecting our user's satisfaction.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)