[jira] [Commented] (FLINK-9805) HTTP Redirect to Active JM in Flink CLI

Sayat Satybaldiyev (JIRA) Wed, 11 Jul 2018 05:53:57 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-9805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16540048#comment-16540048
 ]


Sayat Satybaldiyev commented on FLINK-9805:
-------------------------------------------

Flink cluster setup in HA mode. It has 2 JM - one active the other is standby. 
However, flink client does not have access to ZK to retrieve the actual leader 
information. Thus, I need to have specify endpoint of the jobmanager with -m 
flag.

 

When I specify the address of JM with --jobmanager, it exists however in 
passive standby mode. The client fails to submit the job cluster with 
exception: Could not complete the operation. Number of retries has been 
exhausted

 

Full log:

------------------------------------------------------------
 The program finished with the following exception:

org.apache.flink.client.program.ProgramInvocationException: Could not submit 
job 80ec6d942e910d0de00d50f5e6886461.
 at 
org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:247)
 at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:464)
 at 
org.apache.flink.client.program.DetachedEnvironment.finalizeExecute(DetachedEnvironment.java:77)
 at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:410)
 at org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:785)
 at org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:279)
 at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:214)
 at 
org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1025)
 at org.apache.flink.client.cli.CliFrontend.lambda$main$9(CliFrontend.java:1101)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:422)
 at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556)
 at 
org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
 at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1101)
Caused by: org.apache.flink.runtime.client.JobSubmissionException: Failed to 
submit JobGraph.
 at 
org.apache.flink.client.program.rest.RestClusterClient.lambda$submitJob$8(RestClusterClient.java:370)
 at 
java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870)
 at 
java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:852)
 at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
 at 
java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
 at 
org.apache.flink.runtime.concurrent.FutureUtils.lambda$retryOperationWithDelay$5(FutureUtils.java:214)
 at 
java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
 at 
java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
 at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
 at 
java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
 at 
org.apache.flink.runtime.rest.RestClient$ClientHandler.readRawResponse(RestClient.java:428)
 at 
org.apache.flink.runtime.rest.RestClient$ClientHandler.channelRead0(RestClient.java:374)
 at 
org.apache.flink.shaded.netty4.io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
 at 
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
 at 
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
 at 
org.apache.flink.shaded.netty4.io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
 at 
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
 at 
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
 at 
org.apache.flink.shaded.netty4.io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
 at 
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
 at 
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
 at 
org.apache.flink.shaded.netty4.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:242)
 at 
org.apache.flink.shaded.netty4.io.netty.channel.CombinedChannelDuplexHandler.channelRead(CombinedChannelDuplexHandler.java:147)
 at 
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
 at 
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
 at 
org.apache.flink.shaded.netty4.io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:847)
 at 
org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
 at 
org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
 at 
org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
 at 
org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
 at 
org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
 at 
org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
 at 
org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
 at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.CompletionException: 
org.apache.flink.runtime.concurrent.FutureUtils$RetryException: Could not 
complete the operation. Number of retries has been exhausted.
 at 
java.util.concurrent.CompletableFuture.encodeRelay(CompletableFuture.java:326)
 at 
java.util.concurrent.CompletableFuture.completeRelay(CompletableFuture.java:338)
 at java.util.concurrent.CompletableFuture.uniRelay(CompletableFuture.java:911)
 at 
java.util.concurrent.CompletableFuture$UniRelay.tryFire(CompletableFuture.java:899)
 ... 31 more
Caused by: org.apache.flink.runtime.concurrent.FutureUtils$RetryException: 
Could not complete the operation. Number of retries has been exhausted.
 ... 29 more
Caused by: java.util.concurrent.CompletionException: 
org.apache.flink.runtime.rest.util.RestClientException: Response could not be 
read.
 at 
java.util.concurrent.CompletableFuture.encodeRelay(CompletableFuture.java:326)
 at 
java.util.concurrent.CompletableFuture.completeRelay(CompletableFuture.java:338)
 at java.util.concurrent.CompletableFuture.uniRelay(CompletableFuture.java:911)
 at 
java.util.concurrent.CompletableFuture$UniRelay.tryFire(CompletableFuture.java:899)
 ... 26 more
Caused by: org.apache.flink.runtime.rest.util.RestClientException: Response 
could not be read.
 ... 24 more
Caused by: 
org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.JsonMappingException:
 No content to map due to end-of-input
 at [Source: 
org.apache.flink.shaded.netty4.io.netty.buffer.ByteBufInputStream@460d1061; 
line: 1, column: 0]
 at 
org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.JsonMappingException.from(JsonMappingException.java:256)
 at 
org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.ObjectMapper._initForReading(ObjectMapper.java:3851)
 at 
org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:3792)
 at 
org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.ObjectMapper.readTree(ObjectMapper.java:2272)
 at 
org.apache.flink.runtime.rest.RestClient$ClientHandler.readRawResponse(RestClient.java:410)
 ... 23 mor

 

> HTTP Redirect to Active JM in Flink CLI
> ---------------------------------------
>
>                 Key: FLINK-9805
>                 URL: https://issues.apache.org/jira/browse/FLINK-9805
>             Project: Flink
>          Issue Type: Improvement
>          Components: Client
>    Affects Versions: 1.5.0
>            Reporter: Sayat Satybaldiyev
>            Assignee: vinoyang
>            Priority: Minor
>              Labels: newbie
>
> Flink CLI allows specifying job manager address via --jobmanager flag. 
> However, in HA mode the JM can change and then standby JM does HTTP redirect 
> to the active one. However, during deployment via flink CLI with --jobmanager 
> flag option the CLI does not redirect to the active one. Thus fails to submit 
> job with "Could not complete the operation. Number of retries has been 
> exhausted" 
>  
> *Proposal:*
> Honor JM HTTP redirect in case leadership changes in flink CLI with 
> --jobmanager flag active. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (FLINK-9805) HTTP Redirect to Active JM in Flink CLI

Reply via email to