[ 
https://issues.apache.org/jira/browse/BEAM-5308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16604330#comment-16604330
 ] 

Thomas Weise commented on BEAM-5308:
------------------------------------

Steps to reproduce:
 * Start Flink 1.5.1 cluster: ./bin/start-cluster.sh
 * Run job server (master branch): ./gradlew 
:beam-runners-flink_2.11-job-server:runShadow -PflinkMasterUrl=localhost:8081
 * Run wordcount: ./gradlew :beam-sdks-python:portableWordCount -Pstreaming 
-PjobEndpoint=localhost:8099

Second run exception:
{code:java}
[flink-runner-job-server] ERROR 
org.apache.beam.runners.flink.FlinkJobInvocation - Error during job invocation 
BeamApp-tweise-0905121904-94ae11b1_019a1a43-64e5-4bfd-b2bc-aa41e3835388.

org.apache.flink.client.program.ProgramInvocationException: 
java.lang.RuntimeException: Unable to create context for job 
BeamApp-tweise-0905121904-94ae11b1_019a1a43-64e5-4bfd-b2bc-aa41e3835388

        at 
org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:264)

        at 
org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:464)

        at 
org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:457)

        at 
org.apache.flink.streaming.api.environment.RemoteStreamEnvironment.executeRemotely(RemoteStreamEnvironment.java:219)

        at 
org.apache.flink.streaming.api.environment.RemoteStreamEnvironment.execute(RemoteStreamEnvironment.java:178)

        at 
org.apache.beam.runners.flink.FlinkJobInvocation.runPipeline(FlinkJobInvocation.java:121)

        at 
org.apache.beam.repackaged.beam_runners_flink_2.11.com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:111)

        at 
org.apache.beam.repackaged.beam_runners_flink_2.11.com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:58)

        at 
org.apache.beam.repackaged.beam_runners_flink_2.11.com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:75)

        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

        at java.lang.Thread.run(Thread.java:748)

Caused by: java.lang.RuntimeException: Unable to create context for job 
BeamApp-tweise-0905121904-94ae11b1_019a1a43-64e5-4bfd-b2bc-aa41e3835388

        at 
org.apache.beam.runners.flink.translation.functions.ReferenceCountingFlinkExecutableStageContextFactory.lambda$get$0(ReferenceCountingFlinkExecutableStageContextFactory.java:83)

        at 
java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1660)

        at 
org.apache.beam.runners.flink.translation.functions.ReferenceCountingFlinkExecutableStageContextFactory.get(ReferenceCountingFlinkExecutableStageContextFactory.java:76)

        at 
org.apache.beam.runners.flink.translation.functions.FlinkBatchExecutableStageContext$BatchFactory.get(FlinkBatchExecutableStageContext.java:80)

        at 
org.apache.beam.runners.flink.translation.wrappers.streaming.ExecutableStageDoFnOperator.open(ExecutableStageDoFnOperator.java:136)

        at 
org.apache.flink.streaming.runtime.tasks.StreamTask.openAllOperators(StreamTask.java:420)

        at 
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:296)

        at org.apache.flink.runtime.taskmanager.Task.run(Task.java:703)

        ... 1 more

Caused by: java.io.IOException: Failed to bind

        at 
org.apache.beam.vendor.grpc.v1.io.grpc.netty.NettyServer.start(NettyServer.java:252)

        at 
org.apache.beam.vendor.grpc.v1.io.grpc.internal.ServerImpl.start(ServerImpl.java:163)

        at 
org.apache.beam.vendor.grpc.v1.io.grpc.internal.ServerImpl.start(ServerImpl.java:78)

        at 
org.apache.beam.runners.fnexecution.ServerFactory$InetSocketAddressServerFactory.createServer(ServerFactory.java:130)

        at 
org.apache.beam.runners.fnexecution.ServerFactory$InetSocketAddressServerFactory.allocatePortAndCreate(ServerFactory.java:100)

        at 
org.apache.beam.runners.fnexecution.GrpcFnServer.allocatePortAndCreateFor(GrpcFnServer.java:37)

        at 
org.apache.beam.runners.fnexecution.control.JobBundleFactoryBase.<init>(JobBundleFactoryBase.java:85)

        at 
org.apache.beam.runners.fnexecution.control.DockerJobBundleFactory.<init>(DockerJobBundleFactory.java:75)

        at 
org.apache.beam.runners.fnexecution.control.DockerJobBundleFactory$1.create(DockerJobBundleFactory.java:62)

        at 
org.apache.beam.runners.fnexecution.control.DockerJobBundleFactory.create(DockerJobBundleFactory.java:71)

        at 
org.apache.beam.runners.flink.translation.functions.FlinkBatchExecutableStageContext.create(FlinkBatchExecutableStageContext.java:41)

        at 
org.apache.beam.runners.flink.translation.functions.FlinkBatchExecutableStageContext.access$000(FlinkBatchExecutableStageContext.java:35)

        at 
org.apache.beam.runners.flink.translation.functions.FlinkBatchExecutableStageContext$BatchFactory.lambda$static$5929735b$1(FlinkBatchExecutableStageContext.java:76)

        at 
org.apache.beam.runners.flink.translation.functions.ReferenceCountingFlinkExecutableStageContextFactory.lambda$get$0(ReferenceCountingFlinkExecutableStageContextFactory.java:80)

        ... 8 more

Caused by: java.net.BindException: Address already in use

        at sun.nio.ch.Net.bind0(Native Method)

        at sun.nio.ch.Net.bind(Net.java:433)

        at sun.nio.ch.Net.bind(Net.java:425)

        at 
sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)

        at 
org.apache.beam.vendor.netty.v4.io.netty.channel.socket.nio.NioServerSocketChannel.doBind(NioServerSocketChannel.java:128)

        at 
org.apache.beam.vendor.netty.v4.io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel.java:558)

        at 
org.apache.beam.vendor.netty.v4.io.netty.channel.DefaultChannelPipeline$HeadContext.bind(DefaultChannelPipeline.java:1358)

        at 
org.apache.beam.vendor.netty.v4.io.netty.channel.AbstractChannelHandlerContext.invokeBind(AbstractChannelHandlerContext.java:501)

        at 
org.apache.beam.vendor.netty.v4.io.netty.channel.AbstractChannelHandlerContext.bind(AbstractChannelHandlerContext.java:486)

        at 
org.apache.beam.vendor.netty.v4.io.netty.channel.DefaultChannelPipeline.bind(DefaultChannelPipeline.java:1019)

        at 
org.apache.beam.vendor.netty.v4.io.netty.channel.AbstractChannel.bind(AbstractChannel.java:254)

        at 
org.apache.beam.vendor.netty.v4.io.netty.bootstrap.AbstractBootstrap$2.run(AbstractBootstrap.java:366)

        at 
org.apache.beam.vendor.netty.v4.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)

        at 
org.apache.beam.vendor.netty.v4.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:404)

        at 
org.apache.beam.vendor.netty.v4.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:465)

        at 
org.apache.beam.vendor.netty.v4.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:884)

        at 
org.apache.beam.vendor.netty.v4.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)

        ... 1 more{code}

> JobBundleFactory BindException with FlinkRunner and remote cluster
> ------------------------------------------------------------------
>
>                 Key: BEAM-5308
>                 URL: https://issues.apache.org/jira/browse/BEAM-5308
>             Project: Beam
>          Issue Type: Task
>          Components: runner-flink
>            Reporter: Thomas Weise
>            Priority: Major
>              Labels: portability
>
> Repeated execution of the same job on remote Flink cluster (not embedded in 
> job server) fails with bind exception. There seem to be 2 issues:
>  * Multiple instances of job bundle factory cannot be created (port conflict)
>  * Job bundle factory is not released after job completes (and Docker 
> container keeps on running). That's not the case in embedded mode).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to