[
https://issues.apache.org/jira/browse/FLINK-18290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17135526#comment-17135526
]
Robert Metzger commented on FLINK-18290:
----------------------------------------
First occurrence of a related issue was build 20200612.19 (build ID 3416)
First occurrence of the exact issue with exit code was build 20200612.26 (id
3434). Is a docs change. Potentially problematic commits:
This commit
(https://github.com/flink-ci/flink-mirror/commit/c004b119ef28dc7387935d8d3a4dbf296cc5f661)
introduces a System.exit(-17) in the checkpoint coordinator. 256 - 17 = 239.
Coincidence? Introducing a {{System.exit(-17);}} into any test will lead to
exactly the failure reported here.
This seems to be the reason why System.exit() gets called (from 20200612.19):
{code}
14:56:33,906 [flink-akka.actor.default-dispatcher-2] INFO
org.apache.flink.runtime.taskexecutor.TaskExecutor [] - Stopped
TaskExecutor akka://flink/user/rpc/taskmanager_28.
14:56:33,887 [jobmanager-future-thread-7] ERROR
org.apache.flink.runtime.util.FatalExitExceptionHandler [] - FATAL: Thread
'jobmanager-future-thread-7' produced an uncaught exception. Stopping the
process...
java.util.concurrent.CompletionException:
java.util.concurrent.RejectedExecutionException: Task
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@1db3e21b
rejected from
java.util.concurrent.ScheduledThreadPoolExecutor@198c75f2[Terminated, pool size
= 0, active threads = 0, queued tasks = 0, completed tasks = 20]
at
java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:273)
~[?:1.8.0_242]
at
java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:280)
~[?:1.8.0_242]
at
java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:838)
~[?:1.8.0_242]
at
java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:811)
~[?:1.8.0_242]
at
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
[?:1.8.0_242]
at
java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1609)
[?:1.8.0_242]
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
[?:1.8.0_242]
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
[?:1.8.0_242]
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
[?:1.8.0_242]
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
[?:1.8.0_242]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[?:1.8.0_242]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[?:1.8.0_242]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_242]
Caused by: java.util.concurrent.RejectedExecutionException: Task
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@1db3e21b
rejected from
java.util.concurrent.ScheduledThreadPoolExecutor@198c75f2[Terminated, pool size
= 0, active threads = 0, queued tasks = 0, completed tasks = 20]
at
java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2063)
~[?:1.8.0_242]
at
java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:830)
~[?:1.8.0_242]
at
java.util.concurrent.ScheduledThreadPoolExecutor.delayedExecute(ScheduledThreadPoolExecutor.java:326)
~[?:1.8.0_242]
at
java.util.concurrent.ScheduledThreadPoolExecutor.schedule(ScheduledThreadPoolExecutor.java:533)
~[?:1.8.0_242]
at
java.util.concurrent.ScheduledThreadPoolExecutor.execute(ScheduledThreadPoolExecutor.java:622)
~[?:1.8.0_242]
at
java.util.concurrent.Executors$DelegatedExecutorService.execute(Executors.java:668)
~[?:1.8.0_242]
at
org.apache.flink.runtime.concurrent.ScheduledExecutorServiceAdapter.execute(ScheduledExecutorServiceAdapter.java:62)
~[flink-runtime_2.11-1.11-SNAPSHOT.jar:1.11-SNAPSHOT]
at
java.util.concurrent.CompletableFuture$UniCompletion.claim(CompletableFuture.java:543)
~[?:1.8.0_242]
at
java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:826)
~[?:1.8.0_242]
... 10 more
{code}
Seems to be the same in the .26 run:
{code}
20:51:18,122 [mini-cluster-io-thread-26] INFO
org.apache.flink.runtime.taskexecutor.TaskExecutor [] - JobManager
for job 63af744539889f9a6bf731aa05b02e97 with leader id
93f8812403b7e711da29465d96a74439 lost leadership.
20:51:18,122 [flink-akka.actor.default-dispatcher-5] INFO
org.apache.flink.runtime.taskexecutor.TaskExecutor [] - Close
JobManager connection for job 63af744539889f9a6bf731aa05b02e97.
20:51:18,121 [ Checkpoint Timer] ERROR
org.apache.flink.runtime.util.FatalExitExceptionHandler [] - FATAL: Thread
'Checkpoint Timer' produced an uncaught exception. Stopping the process...
java.util.concurrent.CompletionException:
java.util.concurrent.RejectedExecutionException: Task
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@59fa0f36
rejected from
java.util.concurrent.ScheduledThreadPoolExecutor@1cf89a6d[Shutting down, pool
size = 1, active threads = 1, queued tasks = 0, completed tasks = 7]
at
java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:273)
~[?:1.8.0_242]
at
java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:280)
~[?:1.8.0_242]
at
java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:838)
~[?:1.8.0_242]
at
java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:811)
~[?:1.8.0_242]
at
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
[?:1.8.0_242]
at
java.util.concurrent.CompletableFuture.postFire(CompletableFuture.java:575)
[?:1.8.0_242]
at
java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:594)
[?:1.8.0_242]
at
java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:456)
[?:1.8.0_242]
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
[?:1.8.0_242]
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
[?:1.8.0_242]
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
[?:1.8.0_242]
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
[?:1.8.0_242]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[?:1.8.0_242]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[?:1.8.0_242]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_242]
Caused by: java.util.concurrent.RejectedExecutionException: Task
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@59fa0f36
rejected from
java.util.concurrent.ScheduledThreadPoolExecutor@1cf89a6d[Shutting down, pool
size = 1, active threads = 1, queued tasks = 0, completed tasks = 7]
at
java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2063)
~[?:1.8.0_242]
at
java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:830)
~[?:1.8.0_242]
at
java.util.concurrent.ScheduledThreadPoolExecutor.delayedExecute(ScheduledThreadPoolExecutor.java:326)
~[?:1.8.0_242]
at
java.util.concurrent.ScheduledThreadPoolExecutor.schedule(ScheduledThreadPoolExecutor.java:533)
~[?:1.8.0_242]
at
java.util.concurrent.ScheduledThreadPoolExecutor.execute(ScheduledThreadPoolExecutor.java:622)
~[?:1.8.0_242]
at
java.util.concurrent.Executors$DelegatedExecutorService.execute(Executors.java:668)
~[?:1.8.0_242]
at
org.apache.flink.runtime.concurrent.ScheduledExecutorServiceAdapter.execute(ScheduledExecutorServiceAdapter.java:62)
~[flink-runtime_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
at
java.util.concurrent.CompletableFuture$UniCompletion.claim(CompletableFuture.java:543)
~[?:1.8.0_242]
at
java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:826)
~[?:1.8.0_242]
... 12 more
20:51:18,123 [PermanentBlobCache shutdown hook] INFO
org.apache.flink.runtime.blob.PermanentBlobCache [] - Shutting down
BLOB cache
{code}
> Tests are crashing with exit code 239
> -------------------------------------
>
> Key: FLINK-18290
> URL: https://issues.apache.org/jira/browse/FLINK-18290
> Project: Flink
> Issue Type: Bug
> Components: Build System / Azure Pipelines
> Affects Versions: 1.11.0
> Reporter: Robert Metzger
> Assignee: Robert Metzger
> Priority: Blocker
> Labels: test-stability
>
> [https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=3467&view=logs&j=d44f43ce-542c-597d-bf94-b0718c71e5e8&t=34f486e1-e1e4-5dd2-9c06-bfdd9b9c74a8]
> Kafka011ProducerExactlyOnceITCase
>
> {code:java}
> 2020-06-15T03:24:28.4677649Z [WARNING] The requested profile
> "skip-webui-build" could not be activated because it does not exist.
> 2020-06-15T03:24:28.4692049Z [ERROR] Failed to execute goal
> org.apache.maven.plugins:maven-surefire-plugin:2.22.1:test
> (integration-tests) on project flink-connector-kafka-0.11_2.11: There are
> test failures.
> 2020-06-15T03:24:28.4692585Z [ERROR]
> 2020-06-15T03:24:28.4693170Z [ERROR] Please refer to
> /__w/2/s/flink-connectors/flink-connector-kafka-0.11/target/surefire-reports
> for the individual test results.
> 2020-06-15T03:24:28.4693928Z [ERROR] Please refer to dump files (if any
> exist) [date].dump, [date]-jvmRun[N].dump and [date].dumpstream.
> 2020-06-15T03:24:28.4694423Z [ERROR] ExecutionException The forked VM
> terminated without properly saying goodbye. VM crash or System.exit called?
> 2020-06-15T03:24:28.4696762Z [ERROR] Command was /bin/sh -c cd
> /__w/2/s/flink-connectors/flink-connector-kafka-0.11/target &&
> /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -Xms256m -Xmx2048m
> -Dlog4j.configurationFile=log4j2-test.properties -Dmvn.forkNumber=2
> -XX:-UseGCOverheadLimit -jar
> /__w/2/s/flink-connectors/flink-connector-kafka-0.11/target/surefire/surefirebooter617700788970993266.jar
> /__w/2/s/flink-connectors/flink-connector-kafka-0.11/target/surefire
> 2020-06-15T03-07-01_381-jvmRun2 surefire2676050245109796726tmp
> surefire_602825791089523551074tmp
> 2020-06-15T03:24:28.4698486Z [ERROR] Error occurred in starting fork, check
> output in log
> 2020-06-15T03:24:28.4699066Z [ERROR] Process Exit Code: 239
> 2020-06-15T03:24:28.4699458Z [ERROR] Crashed tests:
> 2020-06-15T03:24:28.4699960Z [ERROR]
> org.apache.flink.streaming.connectors.kafka.Kafka011ProducerExactlyOnceITCase
> 2020-06-15T03:24:28.4700849Z [ERROR]
> org.apache.maven.surefire.booter.SurefireBooterForkException:
> ExecutionException The forked VM terminated without properly saying goodbye.
> VM crash or System.exit called?
> 2020-06-15T03:24:28.4703760Z [ERROR] Command was /bin/sh -c cd
> /__w/2/s/flink-connectors/flink-connector-kafka-0.11/target &&
> /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -Xms256m -Xmx2048m
> -Dlog4j.configurationFile=log4j2-test.properties -Dmvn.forkNumber=2
> -XX:-UseGCOverheadLimit -jar
> /__w/2/s/flink-connectors/flink-connector-kafka-0.11/target/surefire/surefirebooter617700788970993266.jar
> /__w/2/s/flink-connectors/flink-connector-kafka-0.11/target/surefire
> 2020-06-15T03-07-01_381-jvmRun2 surefire2676050245109796726tmp
> surefire_602825791089523551074tmp
> 2020-06-15T03:24:28.4705501Z [ERROR] Error occurred in starting fork, check
> output in log
> 2020-06-15T03:24:28.4706297Z [ERROR] Process Exit Code: 239
> 2020-06-15T03:24:28.4706592Z [ERROR] Crashed tests:
> 2020-06-15T03:24:28.4706895Z [ERROR]
> org.apache.flink.streaming.connectors.kafka.Kafka011ProducerExactlyOnceITCase
> 2020-06-15T03:24:28.4707386Z [ERROR] at
> org.apache.maven.plugin.surefire.booterclient.ForkStarter.awaitResultsDone(ForkStarter.java:510)
> 2020-06-15T03:24:28.4708053Z [ERROR] at
> org.apache.maven.plugin.surefire.booterclient.ForkStarter.runSuitesForkPerTestSet(ForkStarter.java:457)
> 2020-06-15T03:24:28.4708908Z [ERROR] at
> org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:298)
> 2020-06-15T03:24:28.4709720Z [ERROR] at
> org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:246)
> 2020-06-15T03:24:28.4710497Z [ERROR] at
> org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeProvider(AbstractSurefireMojo.java:1183)
> 2020-06-15T03:24:28.4711448Z [ERROR] at
> org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeAfterPreconditionsChecked(AbstractSurefireMojo.java:1011)
> 2020-06-15T03:24:28.4712395Z [ERROR] at
> org.apache.maven.plugin.surefire.AbstractSurefireMojo.execute(AbstractSurefireMojo.java:857)
> 2020-06-15T03:24:28.4712997Z [ERROR] at
> org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:132)
> 2020-06-15T03:24:28.4713524Z [ERROR] at
> org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:208)
> 2020-06-15T03:24:28.4714079Z [ERROR] at
> org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153)
> 2020-06-15T03:24:28.4714560Z [ERROR] at
> org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145)
> 2020-06-15T03:24:28.4715096Z [ERROR] at
> org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:116)
> 2020-06-15T03:24:28.4715672Z [ERROR] at
> org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:80)
> 2020-06-15T03:24:28.4716445Z [ERROR] at
> org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:51)
> 2020-06-15T03:24:28.4717024Z [ERROR] at
> org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:120)
> 2020-06-15T03:24:28.4717478Z [ERROR] at
> org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:355)
> 2020-06-15T03:24:28.4717939Z [ERROR] at
> org.apache.maven.DefaultMaven.execute(DefaultMaven.java:155)
> 2020-06-15T03:24:28.4718378Z [ERROR] at
> org.apache.maven.cli.MavenCli.execute(MavenCli.java:584)
> 2020-06-15T03:24:28.4718852Z [ERROR] at
> org.apache.maven.cli.MavenCli.doMain(MavenCli.java:216)
> 2020-06-15T03:24:28.4719230Z [ERROR] at
> org.apache.maven.cli.MavenCli.main(MavenCli.java:160)
> 2020-06-15T03:24:28.4719676Z [ERROR] at
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 2020-06-15T03:24:28.4720309Z [ERROR] at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 2020-06-15T03:24:28.4720882Z [ERROR] at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 2020-06-15T03:24:28.4721339Z [ERROR] at
> java.lang.reflect.Method.invoke(Method.java:498)
> 2020-06-15T03:24:28.4721888Z [ERROR] at
> org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289)
> 2020-06-15T03:24:28.4722658Z [ERROR] at
> org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229)
> 2020-06-15T03:24:28.4723430Z [ERROR] at
> org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415)
> 2020-06-15T03:24:28.4724062Z [ERROR] at
> org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356)
> 2020-06-15T03:24:28.4724657Z [ERROR] Caused by:
> org.apache.maven.surefire.booter.SurefireBooterForkException: The forked VM
> terminated without properly saying goodbye. VM crash or System.exit called?
> 2020-06-15T03:24:28.4726770Z [ERROR] Command was /bin/sh -c cd
> /__w/2/s/flink-connectors/flink-connector-kafka-0.11/target &&
> /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -Xms256m -Xmx2048m
> -Dlog4j.configurationFile=log4j2-test.properties -Dmvn.forkNumber=2
> -XX:-UseGCOverheadLimit -jar
> /__w/2/s/flink-connectors/flink-connector-kafka-0.11/target/surefire/surefirebooter617700788970993266.jar
> /__w/2/s/flink-connectors/flink-connector-kafka-0.11/target/surefire
> 2020-06-15T03-07-01_381-jvmRun2 surefire2676050245109796726tmp
> surefire_602825791089523551074tmp
> 2020-06-15T03:24:28.4728582Z [ERROR] Error occurred in starting fork, check
> output in log
> 2020-06-15T03:24:28.4729202Z [ERROR] Process Exit Code: 239
> 2020-06-15T03:24:28.4729612Z [ERROR] Crashed tests:
> 2020-06-15T03:24:28.4730247Z [ERROR]
> org.apache.flink.streaming.connectors.kafka.Kafka011ProducerExactlyOnceITCase
> 2020-06-15T03:24:28.4730781Z [ERROR] at
> org.apache.maven.plugin.surefire.booterclient.ForkStarter.fork(ForkStarter.java:669)
> 2020-06-15T03:24:28.4731292Z [ERROR] at
> org.apache.maven.plugin.surefire.booterclient.ForkStarter.access$600(ForkStarter.java:115)
> 2020-06-15T03:24:28.4731829Z [ERROR] at
> org.apache.maven.plugin.surefire.booterclient.ForkStarter$2.call(ForkStarter.java:444)
> 2020-06-15T03:24:28.4732353Z [ERROR] at
> org.apache.maven.plugin.surefire.booterclient.ForkStarter$2.call(ForkStarter.java:420)
> 2020-06-15T03:24:28.4732792Z [ERROR] at
> java.util.concurrent.FutureTask.run(FutureTask.java:266)
> 2020-06-15T03:24:28.4733235Z [ERROR] at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> 2020-06-15T03:24:28.4733718Z [ERROR] at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> 2020-06-15T03:24:28.4734170Z [ERROR] at java.lang.Thread.run(Thread.java:748)
> 2020-06-15T03:24:28.4734682Z [ERROR] -> [Help 1]
> 2020-06-15T03:24:28.4734859Z [ERROR]
> 2020-06-15T03:24:28.4735312Z [ERROR] To see the full stack trace of the
> errors, re-run Maven with the -e switch.
> 2020-06-15T03:24:28.4735927Z [ERROR] Re-run Maven using the -X switch to
> enable full debug logging.
> 2020-06-15T03:24:28.4736439Z [ERROR]
> 2020-06-15T03:24:28.4736952Z [ERROR] For more information about the errors
> and possible solutions, please read the following articles:
> 2020-06-15T03:24:28.4737706Z [ERROR] [Help 1]
> http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
> 2020-06-15T03:24:28.4738167Z [ERROR]
> 2020-06-15T03:24:28.4738553Z [ERROR] After correcting the problems, you can
> resume the build with the command
> 2020-06-15T03:24:28.4739663Z [ERROR] mvn <goals> -rf
> :flink-connector-kafka-0.11_2.11
> 2020-06-15T03:24:29.0980029Z MVN exited with EXIT CODE: 1.
> {code}
> This could be a CI environment issue...
> When did it start?
--
This message was sent by Atlassian Jira
(v8.3.4#803005)