[ 
https://issues.apache.org/jira/browse/FLINK-8891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gary Yao updated FLINK-8891:
----------------------------
    Description: 
*Description*
When deploying a Flink session on YARN, the {{DispatcherRestEndpoint}} may 
incorrectly bind on a local address. When this happens, the job submission and 
all REST API calls using a non-local address will fail. Setting {{rest.address: 
0.0.0.0}} in {{flink-conf.yaml}} has no effect because the value is overridden.

*znode leader contents*
{noformat}
[zk: localhost:2181(CONNECTED) 3] get 
/flink/application_1520439896153_0001/leader/rest_server_lock
??whttp://127.0.1.1:56299srjava.util.UUID????m?/J
                                                 leastSigBitsJ
                                                              
mostSigBitsxp??L???g?M??KFK
cZxid = 0x10000000a
ctime = Wed Mar 07 16:25:21 UTC 2018
mZxid = 0x10000000a
mtime = Wed Mar 07 16:25:21 UTC 2018
pZxid = 0x10000000a
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x5620147c1220000
dataLength = 106
numChildren = 0
{noformat}

*Contents of {{/etc/hosts}}*
{noformat}
127.0.1.1 ip-172-31-36-187.eu-central-1.compute.internal ip-172-31-36-187
127.0.0.1 localhost

# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts
{noformat}

Note that without the first line, the problem does not appear.

*Error message & Stacktrace*
{noformat}
2018-03-07 16:25:24,267 INFO  
org.apache.flink.yarn.AbstractYarnClusterDescriptor           - Found 
application JobManager host name 
'ip-172-31-44-106.eu-central-1.compute.internal' and port '56299' from supplied 
application id 'application_1520439896153_0001'
Using the parallelism provided by the remote cluster (0). To use another 
parallelism, set it at the ./bin/flink client.
Starting execution of program


STDERR:

------------------------------------------------------------
 The program finished with the following exception:

org.apache.flink.client.program.ProgramInvocationException: Could not submit 
job 6243b830a6cb1a0b6605a15a7d3d81d4.
        at 
org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:231)
        at 
org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:457)
        at 
org.apache.flink.client.program.DetachedEnvironment.finalizeExecute(DetachedEnvironment.java:77)
        at 
org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:403)
        at 
org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:780)
        at 
org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:274)
        at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:209)
        at 
org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1019)
        at 
org.apache.flink.client.cli.CliFrontend.lambda$main$9(CliFrontend.java:1095)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836)
        at 
org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
        at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1095)
Caused by: org.apache.flink.runtime.client.JobSubmissionException: Failed to 
submit JobGraph.
        at 
org.apache.flink.client.program.rest.RestClusterClient.lambda$submitJob$4(RestClusterClient.java:327)
        at 
java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870)
        at 
java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:852)
        at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
        at 
java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
        at 
org.apache.flink.runtime.rest.RestClient.lambda$submitRequest$1(RestClient.java:196)
        at 
org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680)
        at 
org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:603)
        at 
org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:563)
        at 
org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:424)
        at 
org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.fulfillConnectPromise(AbstractNioChannel.java:268)
        at 
org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:284)
        at 
org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528)
        at 
org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
        at 
org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
        at 
org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
        at 
org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
        at 
org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.CompletionException: java.net.ConnectException: 
Connection refused: /127.0.1.1:56299
        at 
java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
        at 
java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
        at 
java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:943)
        at 
java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:926)
        ... 16 more
Caused by: java.net.ConnectException: Connection refused: /127.0.1.1:56299
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
        at 
org.apache.flink.shaded.netty4.io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:224)
        at 
org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:281)
        ... 7 more
{noformat}



  was:
*Description*
When deploying a Flink session on YARN, the {{DispatcherRestEndpoint}} may 
incorrectly bind on a local address. When this happens, the job submission and 
all REST API calls using a non-local address will fail.

*znode leader contents*
{noformat}
[zk: localhost:2181(CONNECTED) 3] get 
/flink/application_1520439896153_0001/leader/rest_server_lock
??whttp://127.0.1.1:56299srjava.util.UUID????m?/J
                                                 leastSigBitsJ
                                                              
mostSigBitsxp??L???g?M??KFK
cZxid = 0x10000000a
ctime = Wed Mar 07 16:25:21 UTC 2018
mZxid = 0x10000000a
mtime = Wed Mar 07 16:25:21 UTC 2018
pZxid = 0x10000000a
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x5620147c1220000
dataLength = 106
numChildren = 0
{noformat}

*Contents of {{/etc/hosts}}*
{noformat}
127.0.1.1 ip-172-31-36-187.eu-central-1.compute.internal ip-172-31-36-187
127.0.0.1 localhost

# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts
{noformat}

Note that without the first line, the problem does not appear.

*Error message & Stacktrace*
{noformat}
2018-03-07 16:25:24,267 INFO  
org.apache.flink.yarn.AbstractYarnClusterDescriptor           - Found 
application JobManager host name 
'ip-172-31-44-106.eu-central-1.compute.internal' and port '56299' from supplied 
application id 'application_1520439896153_0001'
Using the parallelism provided by the remote cluster (0). To use another 
parallelism, set it at the ./bin/flink client.
Starting execution of program


STDERR:

------------------------------------------------------------
 The program finished with the following exception:

org.apache.flink.client.program.ProgramInvocationException: Could not submit 
job 6243b830a6cb1a0b6605a15a7d3d81d4.
        at 
org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:231)
        at 
org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:457)
        at 
org.apache.flink.client.program.DetachedEnvironment.finalizeExecute(DetachedEnvironment.java:77)
        at 
org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:403)
        at 
org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:780)
        at 
org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:274)
        at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:209)
        at 
org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1019)
        at 
org.apache.flink.client.cli.CliFrontend.lambda$main$9(CliFrontend.java:1095)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836)
        at 
org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
        at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1095)
Caused by: org.apache.flink.runtime.client.JobSubmissionException: Failed to 
submit JobGraph.
        at 
org.apache.flink.client.program.rest.RestClusterClient.lambda$submitJob$4(RestClusterClient.java:327)
        at 
java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870)
        at 
java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:852)
        at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
        at 
java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
        at 
org.apache.flink.runtime.rest.RestClient.lambda$submitRequest$1(RestClient.java:196)
        at 
org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680)
        at 
org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:603)
        at 
org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:563)
        at 
org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:424)
        at 
org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.fulfillConnectPromise(AbstractNioChannel.java:268)
        at 
org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:284)
        at 
org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528)
        at 
org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
        at 
org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
        at 
org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
        at 
org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
        at 
org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.CompletionException: java.net.ConnectException: 
Connection refused: /127.0.1.1:56299
        at 
java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
        at 
java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
        at 
java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:943)
        at 
java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:926)
        ... 16 more
Caused by: java.net.ConnectException: Connection refused: /127.0.1.1:56299
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
        at 
org.apache.flink.shaded.netty4.io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:224)
        at 
org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:281)
        ... 7 more
{noformat}




> RestServerEndpoint can bind on local address only
> -------------------------------------------------
>
>                 Key: FLINK-8891
>                 URL: https://issues.apache.org/jira/browse/FLINK-8891
>             Project: Flink
>          Issue Type: Bug
>          Components: REST, YARN
>    Affects Versions: 1.5.0
>         Environment: EC2 AMI debian-jessie-amd64-hvm-2017-01-15-1221-ebs 
> (ami-5900cc36)
> Hadoop 2.8.3
> Flink commit 80020cb5866c8bac67a48f89aa481de7de262f83
>            Reporter: Gary Yao
>            Priority: Blocker
>              Labels: flip-6
>             Fix For: 1.5.0
>
>
> *Description*
> When deploying a Flink session on YARN, the {{DispatcherRestEndpoint}} may 
> incorrectly bind on a local address. When this happens, the job submission 
> and all REST API calls using a non-local address will fail. Setting 
> {{rest.address: 0.0.0.0}} in {{flink-conf.yaml}} has no effect because the 
> value is overridden.
> *znode leader contents*
> {noformat}
> [zk: localhost:2181(CONNECTED) 3] get 
> /flink/application_1520439896153_0001/leader/rest_server_lock
> ??whttp://127.0.1.1:56299srjava.util.UUID????m?/J
>                                                  leastSigBitsJ
>                                                               
> mostSigBitsxp??L???g?M??KFK
> cZxid = 0x10000000a
> ctime = Wed Mar 07 16:25:21 UTC 2018
> mZxid = 0x10000000a
> mtime = Wed Mar 07 16:25:21 UTC 2018
> pZxid = 0x10000000a
> cversion = 0
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x5620147c1220000
> dataLength = 106
> numChildren = 0
> {noformat}
> *Contents of {{/etc/hosts}}*
> {noformat}
> 127.0.1.1 ip-172-31-36-187.eu-central-1.compute.internal ip-172-31-36-187
> 127.0.0.1 localhost
> # The following lines are desirable for IPv6 capable hosts
> ::1 ip6-localhost ip6-loopback
> fe00::0 ip6-localnet
> ff00::0 ip6-mcastprefix
> ff02::1 ip6-allnodes
> ff02::2 ip6-allrouters
> ff02::3 ip6-allhosts
> {noformat}
> Note that without the first line, the problem does not appear.
> *Error message & Stacktrace*
> {noformat}
> 2018-03-07 16:25:24,267 INFO  
> org.apache.flink.yarn.AbstractYarnClusterDescriptor           - Found 
> application JobManager host name 
> 'ip-172-31-44-106.eu-central-1.compute.internal' and port '56299' from 
> supplied application id 'application_1520439896153_0001'
> Using the parallelism provided by the remote cluster (0). To use another 
> parallelism, set it at the ./bin/flink client.
> Starting execution of program
> STDERR:
> ------------------------------------------------------------
>  The program finished with the following exception:
> org.apache.flink.client.program.ProgramInvocationException: Could not submit 
> job 6243b830a6cb1a0b6605a15a7d3d81d4.
>       at 
> org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:231)
>       at 
> org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:457)
>       at 
> org.apache.flink.client.program.DetachedEnvironment.finalizeExecute(DetachedEnvironment.java:77)
>       at 
> org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:403)
>       at 
> org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:780)
>       at 
> org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:274)
>       at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:209)
>       at 
> org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1019)
>       at 
> org.apache.flink.client.cli.CliFrontend.lambda$main$9(CliFrontend.java:1095)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:422)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836)
>       at 
> org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
>       at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1095)
> Caused by: org.apache.flink.runtime.client.JobSubmissionException: Failed to 
> submit JobGraph.
>       at 
> org.apache.flink.client.program.rest.RestClusterClient.lambda$submitJob$4(RestClusterClient.java:327)
>       at 
> java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870)
>       at 
> java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:852)
>       at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
>       at 
> java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
>       at 
> org.apache.flink.runtime.rest.RestClient.lambda$submitRequest$1(RestClient.java:196)
>       at 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680)
>       at 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:603)
>       at 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:563)
>       at 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:424)
>       at 
> org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.fulfillConnectPromise(AbstractNioChannel.java:268)
>       at 
> org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:284)
>       at 
> org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528)
>       at 
> org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
>       at 
> org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
>       at 
> org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
>       at 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
>       at 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
>       at java.lang.Thread.run(Thread.java:748)
> Caused by: java.util.concurrent.CompletionException: 
> java.net.ConnectException: Connection refused: /127.0.1.1:56299
>       at 
> java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
>       at 
> java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
>       at 
> java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:943)
>       at 
> java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:926)
>       ... 16 more
> Caused by: java.net.ConnectException: Connection refused: /127.0.1.1:56299
>       at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>       at 
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
>       at 
> org.apache.flink.shaded.netty4.io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:224)
>       at 
> org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:281)
>       ... 7 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to