[ 
https://issues.apache.org/jira/browse/FLINK-10825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16679821#comment-16679821
 ] 

ASF GitHub Bot commented on FLINK-10825:
----------------------------------------

zentol opened a new pull request #7062: [FLINK-10825][tests] Increase 
request-backoff for high-parallelism e2e test
URL: https://github.com/apache/flink/pull/7062
 
 
   ## What is the purpose of the change
   
   This PR stabilizes the high-parallelism iterations e2e test.
   
   When a task starts running it requests data (partitions) from other tasks. 
In case of a timeout the request is retried with a backoff, until the maximum 
backoff (`taskmanager.network.request-backoff.max`) is reached.
   When reached a `PartitionNotFoundException` is thrown as reported in the 
JIRA that fails the job.
   
   If a job is not fully deployed within the time that it takes 1 task to reach 
the maximum backoff it is quite likely for this exception to occur.
   
   This PR bumps the maximum backoff to 60 seconds, which should give the job 
more time to fully deploy.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


> ConnectedComponents test instable on Travis
> -------------------------------------------
>
>                 Key: FLINK-10825
>                 URL: https://issues.apache.org/jira/browse/FLINK-10825
>             Project: Flink
>          Issue Type: Bug
>          Components: E2E Tests
>    Affects Versions: 1.7.0
>            Reporter: Timo Walther
>            Assignee: Chesnay Schepler
>            Priority: Critical
>              Labels: pull-request-available
>             Fix For: 1.7.0
>
>         Attachments: log.txt
>
>
> The "ConnectedComponents iterations with high parallelism end-to-end test" 
> succeeds on Travis but the log contains with the following exception:
> {code}
> 2018-11-08 10:15:13,698 ERROR 
> org.apache.flink.runtime.taskexecutor.rpc.RpcResultPartitionConsumableNotifier
>   - Could not schedule or update consumers at the JobManager.
> org.apache.flink.runtime.executiongraph.ExecutionGraphException: Cannot find 
> execution for execution Id 5b02c2f51e51f68b66bfab07afc1bf17.
>       at 
> org.apache.flink.runtime.executiongraph.ExecutionGraph.scheduleOrUpdateConsumers(ExecutionGraph.java:1635)
>       at 
> org.apache.flink.runtime.jobmaster.JobMaster.scheduleOrUpdateConsumers(JobMaster.java:637)
>       at sun.reflect.GeneratedMethodAccessor44.invoke(Unknown Source)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:498)
>       at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:247)
>       at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:162)
>       at 
> org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:70)
>       at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.onReceive(AkkaRpcActor.java:142)
>       at 
> org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.onReceive(FencedAkkaRpcActor.java:40)
>       at 
> akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(UntypedActor.scala:165)
>       at akka.actor.Actor.aroundReceive(Actor.scala:502)
>       at akka.actor.Actor.aroundReceive$(Actor.scala:500)
>       at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:95)
>       at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)
>       at akka.actor.ActorCell.invoke(ActorCell.scala:495)
>       at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)
>       at akka.dispatch.Mailbox.run(Mailbox.scala:224)
>       at akka.dispatch.Mailbox.exec(Mailbox.scala:234)
>       at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
>       at 
> java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
>       at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
>       at 
> java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to