[ 
https://issues.apache.org/jira/browse/FLINK-28392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17562608#comment-17562608
 ] 

Chesnay Schepler commented on FLINK-28392:
------------------------------------------

I was able to somewhat reliably reproduce the issue locally.

If the scheduled executor service that backs the main thread is shut down 
before the deployment has started (Execution#deploy), then the deployment 
fails, causing the execution to transition to FAILED and it being deregistered 
from the set of active executions, which are the only ones visible to the 
restart strategy / ExecutionDeployer.

Putting aside that the test shouldn't just close the executor, it could be that 
the PipelinedRestartStrategy just doesn't work if a task fails on the JM side.

[~zhuzh] any thoughts?

 

> RemoveCachedShuffleDescriptorTest#testRemoveOffloadedCacheForPointwiseEdgeAfterFailover
>  causes fatal error on CI
> ----------------------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-28392
>                 URL: https://issues.apache.org/jira/browse/FLINK-28392
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Coordination
>    Affects Versions: 1.16.0
>            Reporter: Martijn Visser
>            Assignee: Chesnay Schepler
>            Priority: Critical
>             Fix For: 1.16.0
>
>
> {code:java}
> Jul 05 03:30:03 [ERROR] Error occurred in starting fork, check output in log
> Jul 05 03:30:03 [ERROR] Process Exit Code: 239
> Jul 05 03:30:03 [ERROR] Crashed tests:
> Jul 05 03:30:03 [ERROR] 
> org.apache.flink.runtime.executiongraph.failover.flip1.RestartPipelinedRegionFailoverStrategyTest
> Jul 05 03:30:03 [ERROR] 
> org.apache.maven.surefire.booter.SurefireBooterForkException: 
> ExecutionException The forked VM terminated without properly saying goodbye. 
> VM crash or System.exit called?
> Jul 05 03:30:03 [ERROR] Command was /bin/sh -c cd /__w/1/s/flink-runtime && 
> /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -XX:+UseG1GC -Xms256m -Xmx768m 
> -jar 
> /__w/1/s/flink-runtime/target/surefire/surefirebooter4932865857415988980.jar 
> /__w/1/s/flink-runtime/target/surefire 2022-07-05T03-23-25_404-jvmRun1 
> surefire8916732512419442726tmp surefire_2130262314165063415tmp
> Jul 05 03:30:03 [ERROR] Error occurred in starting fork, check output in log
> Jul 05 03:30:03 [ERROR] Process Exit Code: 239
> Jul 05 03:30:03 [ERROR] Crashed tests:
> Jul 05 03:30:03 [ERROR] 
> org.apache.flink.runtime.executiongraph.failover.flip1.RestartPipelinedRegionFailoverStrategyTest
> Jul 05 03:30:03 [ERROR] at 
> org.apache.maven.plugin.surefire.booterclient.ForkStarter.awaitResultsDone(ForkStarter.java:532)
> Jul 05 03:30:03 [ERROR] at 
> org.apache.maven.plugin.surefire.booterclient.ForkStarter.runSuitesForkOnceMultiple(ForkStarter.java:405)
> Jul 05 03:30:03 [ERROR] at 
> org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:321)
> Jul 05 03:30:03 [ERROR] at 
> org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:266)
> Jul 05 03:30:03 [ERROR] at 
> org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeProvider(AbstractSurefireMojo.java:1314)
> Jul 05 03:30:03 [ERROR] at 
> org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeAfterPreconditionsChecked(AbstractSurefireMojo.java:1159)
> {code}
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=37602&view=logs&j=4d4a0d10-fca2-5507-8eed-c07f0bdf4887&t=7b25afdf-cc6c-566f-5459-359dc2585798&l=8147



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to