[ 
https://issues.apache.org/jira/browse/FLINK-10671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gary Yao updated FLINK-10671:
-----------------------------
    Description: 
Hi,
 
There seems to be a problem with REST monitoring API:
|/jobs/:jobid/savepoints/:triggerid|
 
The problem is that when the Savepoint represented by {{:triggerid}} is called 
with {{cancel=true}} the above status call seems to fail if the savepoint 
duration exceeds {{akka.ask.timeout}} value.
 
Below is a log in which I invoke "cancel with savepoint" then poll the above 
endpoint for status at 2 second intervals. {{akka.ask.timout}} is set for 
twenty seconds. The error is repeatable at various values of 
{{akka.ask.timeout}}.
 
{noformat}
2018/10/24 19:42:25 savepoint id 925964b35b2d501f4a45b714eca0a2ca is IN_PROGRESS
2018/10/24 19:42:27 savepoint id 925964b35b2d501f4a45b714eca0a2ca is IN_PROGRESS
2018/10/24 19:42:29 savepoint id 925964b35b2d501f4a45b714eca0a2ca is IN_PROGRESS
2018/10/24 19:42:31 savepoint id 925964b35b2d501f4a45b714eca0a2ca is IN_PROGRESS
2018/10/24 19:42:33 savepoint id 925964b35b2d501f4a45b714eca0a2ca is IN_PROGRESS
2018/10/24 19:42:35 savepoint id 925964b35b2d501f4a45b714eca0a2ca is IN_PROGRESS
2018/10/24 19:42:37 savepoint id 925964b35b2d501f4a45b714eca0a2ca is IN_PROGRESS
2018/10/24 19:42:39 savepoint id 925964b35b2d501f4a45b714eca0a2ca is IN_PROGRESS
2018/10/24 19:42:41 savepoint id 925964b35b2d501f4a45b714eca0a2ca is IN_PROGRESS
2018/10/24 19:42:43 savepoint id 925964b35b2d501f4a45b714eca0a2ca is IN_PROGRESS
2018/10/24 19:42:45 Cancel with Savepoint may have failed: 
java.util.concurrent.CompletionException: akka.pattern.AskTimeoutException: Ask 
timed out on [Actor[akka://flink/user/jobmanager_0#-234856817]] after [20000 
ms]. Sender[null] sent message of type 
"org.apache.flink.runtime.rpc.messages.LocalFencedMessage".
at 
java.util.concurrent.CompletableFuture.encodeRelay(CompletableFuture.java:326)
at 
java.util.concurrent.CompletableFuture.completeRelay(CompletableFuture.java:338)
at java.util.concurrent.CompletableFuture.uniRelay(CompletableFuture.java:911)
at 
java.util.concurrent.CompletableFuture$UniRelay.tryFire(CompletableFuture.java:899)
at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
at 
java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
at 
org.apache.flink.runtime.concurrent.FutureUtils$1.onComplete(FutureUtils.java:770)
at akka.dispatch.OnComplete.internal(Future.scala:258)
at akka.dispatch.OnComplete.internal(Future.scala:256)
at akka.dispatch.japi$CallbackBridge.apply(Future.scala:186)
at akka.dispatch.japi$CallbackBridge.apply(Future.scala:183)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
at 
org.apache.flink.runtime.concurrent.Executors$DirectExecutionContext.execute(Executors.java:83)
at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:44)
at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:252)
at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:603)
at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126)
at 
scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
at scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)
at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)
at 
akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:329)
at 
akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:280)
at 
akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:284)
at 
akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:236)
at java.lang.Thread.run(Thread.java:748)
Caused by: akka.pattern.AskTimeoutException: Ask timed out on 
[Actor[akka://flink/user/jobmanager_0#-234856817]] after [20000 ms]. 
Sender[null] sent message of type 
"org.apache.flink.runtime.rpc.messages.LocalFencedMessage".
at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:604)
... 9 more
{noformat}

  was:
Hi,
 
There seems to be a problem with REST monitoring API:
|/jobs/:jobid/savepoints/:triggerid|
 
The problem is that when the Savepoint represented by :triggerid is called with 
`cancel=true` the above status call seems to fail if the savepoint duration 
exceeds `akka.ask.timeout` value.
 
Below is a log in which I invoke "cancel with savepoint" then poll the above 
endpoint for status at 2 second intervals. akka.ask.timout is set for twenty 
seconds. The error is repeatable at various values of akka.ask.timeout.
 
2018/10/24 19:42:25 savepoint id 925964b35b2d501f4a45b714eca0a2ca is IN_PROGRESS
2018/10/24 19:42:27 savepoint id 925964b35b2d501f4a45b714eca0a2ca is IN_PROGRESS
2018/10/24 19:42:29 savepoint id 925964b35b2d501f4a45b714eca0a2ca is IN_PROGRESS
2018/10/24 19:42:31 savepoint id 925964b35b2d501f4a45b714eca0a2ca is IN_PROGRESS
2018/10/24 19:42:33 savepoint id 925964b35b2d501f4a45b714eca0a2ca is IN_PROGRESS
2018/10/24 19:42:35 savepoint id 925964b35b2d501f4a45b714eca0a2ca is IN_PROGRESS
2018/10/24 19:42:37 savepoint id 925964b35b2d501f4a45b714eca0a2ca is IN_PROGRESS
2018/10/24 19:42:39 savepoint id 925964b35b2d501f4a45b714eca0a2ca is IN_PROGRESS
2018/10/24 19:42:41 savepoint id 925964b35b2d501f4a45b714eca0a2ca is IN_PROGRESS
2018/10/24 19:42:43 savepoint id 925964b35b2d501f4a45b714eca0a2ca is IN_PROGRESS
2018/10/24 19:42:45 Cancel with Savepoint may have failed: 
java.util.concurrent.CompletionException: akka.pattern.AskTimeoutException: Ask 
timed out on [Actor[akka://flink/user/jobmanager_0#-234856817]] after [20000 
ms]. Sender[null] sent message of type 
"org.apache.flink.runtime.rpc.messages.LocalFencedMessage".
at 
java.util.concurrent.CompletableFuture.encodeRelay(CompletableFuture.java:326)
at 
java.util.concurrent.CompletableFuture.completeRelay(CompletableFuture.java:338)
at java.util.concurrent.CompletableFuture.uniRelay(CompletableFuture.java:911)
at 
java.util.concurrent.CompletableFuture$UniRelay.tryFire(CompletableFuture.java:899)
at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
at 
java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
at 
org.apache.flink.runtime.concurrent.FutureUtils$1.onComplete(FutureUtils.java:770)
at akka.dispatch.OnComplete.internal(Future.scala:258)
at akka.dispatch.OnComplete.internal(Future.scala:256)
at akka.dispatch.japi$CallbackBridge.apply(Future.scala:186)
at akka.dispatch.japi$CallbackBridge.apply(Future.scala:183)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
at 
org.apache.flink.runtime.concurrent.Executors$DirectExecutionContext.execute(Executors.java:83)
at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:44)
at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:252)
at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:603)
at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126)
at 
scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
at scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)
at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)
at 
akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:329)
at 
akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:280)
at 
akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:284)
at 
akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:236)
at java.lang.Thread.run(Thread.java:748)
Caused by: akka.pattern.AskTimeoutException: Ask timed out on 
[Actor[akka://flink/user/jobmanager_0#-234856817]] after [20000 ms]. 
Sender[null] sent message of type 
"org.apache.flink.runtime.rpc.messages.LocalFencedMessage".
at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:604)
... 9 more


> rest monitoring api Savepoint status call fails if akka.ask.timeout < 
> checkpoint duration
> -----------------------------------------------------------------------------------------
>
>                 Key: FLINK-10671
>                 URL: https://issues.apache.org/jira/browse/FLINK-10671
>             Project: Flink
>          Issue Type: Bug
>          Components: REST
>    Affects Versions: 1.6.1
>            Reporter: Cliff Resnick
>            Assignee: Gary Yao
>            Priority: Minor
>
> Hi,
>  
> There seems to be a problem with REST monitoring API:
> |/jobs/:jobid/savepoints/:triggerid|
>  
> The problem is that when the Savepoint represented by {{:triggerid}} is 
> called with {{cancel=true}} the above status call seems to fail if the 
> savepoint duration exceeds {{akka.ask.timeout}} value.
>  
> Below is a log in which I invoke "cancel with savepoint" then poll the above 
> endpoint for status at 2 second intervals. {{akka.ask.timout}} is set for 
> twenty seconds. The error is repeatable at various values of 
> {{akka.ask.timeout}}.
>  
> {noformat}
> 2018/10/24 19:42:25 savepoint id 925964b35b2d501f4a45b714eca0a2ca is 
> IN_PROGRESS
> 2018/10/24 19:42:27 savepoint id 925964b35b2d501f4a45b714eca0a2ca is 
> IN_PROGRESS
> 2018/10/24 19:42:29 savepoint id 925964b35b2d501f4a45b714eca0a2ca is 
> IN_PROGRESS
> 2018/10/24 19:42:31 savepoint id 925964b35b2d501f4a45b714eca0a2ca is 
> IN_PROGRESS
> 2018/10/24 19:42:33 savepoint id 925964b35b2d501f4a45b714eca0a2ca is 
> IN_PROGRESS
> 2018/10/24 19:42:35 savepoint id 925964b35b2d501f4a45b714eca0a2ca is 
> IN_PROGRESS
> 2018/10/24 19:42:37 savepoint id 925964b35b2d501f4a45b714eca0a2ca is 
> IN_PROGRESS
> 2018/10/24 19:42:39 savepoint id 925964b35b2d501f4a45b714eca0a2ca is 
> IN_PROGRESS
> 2018/10/24 19:42:41 savepoint id 925964b35b2d501f4a45b714eca0a2ca is 
> IN_PROGRESS
> 2018/10/24 19:42:43 savepoint id 925964b35b2d501f4a45b714eca0a2ca is 
> IN_PROGRESS
> 2018/10/24 19:42:45 Cancel with Savepoint may have failed: 
> java.util.concurrent.CompletionException: akka.pattern.AskTimeoutException: 
> Ask timed out on [Actor[akka://flink/user/jobmanager_0#-234856817]] after 
> [20000 ms]. Sender[null] sent message of type 
> "org.apache.flink.runtime.rpc.messages.LocalFencedMessage".
> at 
> java.util.concurrent.CompletableFuture.encodeRelay(CompletableFuture.java:326)
> at 
> java.util.concurrent.CompletableFuture.completeRelay(CompletableFuture.java:338)
> at java.util.concurrent.CompletableFuture.uniRelay(CompletableFuture.java:911)
> at 
> java.util.concurrent.CompletableFuture$UniRelay.tryFire(CompletableFuture.java:899)
> at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
> at 
> java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
> at 
> org.apache.flink.runtime.concurrent.FutureUtils$1.onComplete(FutureUtils.java:770)
> at akka.dispatch.OnComplete.internal(Future.scala:258)
> at akka.dispatch.OnComplete.internal(Future.scala:256)
> at akka.dispatch.japi$CallbackBridge.apply(Future.scala:186)
> at akka.dispatch.japi$CallbackBridge.apply(Future.scala:183)
> at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
> at 
> org.apache.flink.runtime.concurrent.Executors$DirectExecutionContext.execute(Executors.java:83)
> at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:44)
> at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:252)
> at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:603)
> at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126)
> at 
> scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
> at scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)
> at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)
> at 
> akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:329)
> at 
> akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:280)
> at 
> akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:284)
> at 
> akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:236)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: akka.pattern.AskTimeoutException: Ask timed out on 
> [Actor[akka://flink/user/jobmanager_0#-234856817]] after [20000 ms]. 
> Sender[null] sent message of type 
> "org.apache.flink.runtime.rpc.messages.LocalFencedMessage".
> at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:604)
> ... 9 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to