Roman Khachatryan created FLINK-35787:
-----------------------------------------
Summary: DefaultSlotStatusSyncer might bring down JVM (exit code
239 instead of a proper shutdown)
Key: FLINK-35787
URL: https://issues.apache.org/jira/browse/FLINK-35787
Project: Flink
Issue Type: Bug
Reporter: Roman Khachatryan
In our internal CI, I've encountered the following error:
{code:java}
* 12:02:47,205 [ pool-126-thread-1] ERROR
org.apache.flink.util.FatalExitExceptionHandler [] - FATAL: Thread
'pool-126-thread-1' produced an uncaught exception. Stopping the process...
java.util.concurrent.CompletionException:
java.util.concurrent.RejectedExecutionException: Task
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@38ce013a[Not
completed, task =
java.util.concurrent.Executors$RunnableAdapter@640a9cf7[Wrapped task =
java.util.concurrent.>
at
java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:314)
~[?:?]
at
java.util.concurrent.CompletableFuture.uniHandleStage(CompletableFuture.java:951)
~[?:?]
at
java.util.concurrent.CompletableFuture.handleAsync(CompletableFuture.java:2282)
~[?:?]
at
org.apache.flink.runtime.resourcemanager.slotmanager.DefaultSlotStatusSyncer.allocateSlot(DefaultSlotStatusSyncer.java:138)
~[classes/:?]
at
org.apache.flink.runtime.resourcemanager.slotmanager.FineGrainedSlotManager.allocateSlotsAccordingTo(FineGrainedSlotManager.java:722)
~[classes/:?]
at
org.apache.flink.runtime.resourcemanager.slotmanager.FineGrainedSlotManager.checkResourceRequirements(FineGrainedSlotManager.java:645)
~[classes/:?]
at
org.apache.flink.runtime.resourcemanager.slotmanager.FineGrainedSlotManager.lambda$checkResourceRequirementsWithDelay$12(FineGrainedSlotManager.java:603)
~[classes/:?]
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
[?:?]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
[?:?]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
[?:?]
at java.lang.Thread.run(Thread.java:829) [?:?]
Caused by: java.util.concurrent.RejectedExecutionException: Task
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@38ce013a[Not
completed, task =
java.util.concurrent.Executors$RunnableAdapter@640a9cf7[Wrapped task =
java.util.concurrent.CompletableFuture$UniHandle@f3d>
at
java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2055)
~[?:?]
at
java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:825)
~[?:?]
at
java.util.concurrent.ScheduledThreadPoolExecutor.delayedExecute(ScheduledThreadPoolExecutor.java:340)
~[?:?]
at
java.util.concurrent.ScheduledThreadPoolExecutor.schedule(ScheduledThreadPoolExecutor.java:562)
~[?:?]
at
java.util.concurrent.ScheduledThreadPoolExecutor.execute(ScheduledThreadPoolExecutor.java:705)
~[?:?]
at
java.util.concurrent.Executors$DelegatedExecutorService.execute(Executors.java:687)
~[?:?]
at
java.util.concurrent.CompletableFuture.uniHandleStage(CompletableFuture.java:949)
~[?:?]
... 11 more{code}
>From the code, it looks like RM main thread executor was shut down, and that
>triggered JVM exit:
{code:java}
CompletableFuture<Acknowledge> requestFuture =
gateway.requestSlot(
SlotID.getDynamicSlotID(resourceId),
jobId,
allocationId,
resourceProfile,
targetAddress,
resourceManagerId,
taskManagerRequestTimeout);
CompletableFuture<Void> returnedFuture = new CompletableFuture<>();
FutureUtils.assertNoException(
requestFuture.handleAsync(
(Acknowledge acknowledge, Throwable throwable) -> { ...
},
mainThreadExecutor));
{code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)