[
https://issues.apache.org/jira/browse/FLINK-35787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Roman Khachatryan updated FLINK-35787:
--------------------------------------
Component/s: Runtime / Coordination
> DefaultSlotStatusSyncer might bring down JVM (exit code 239 instead of a
> proper shutdown)
> -----------------------------------------------------------------------------------------
>
> Key: FLINK-35787
> URL: https://issues.apache.org/jira/browse/FLINK-35787
> Project: Flink
> Issue Type: Bug
> Components: Runtime / Coordination
> Reporter: Roman Khachatryan
> Priority: Major
>
> In our internal CI, I've encountered the following error:
> {code:java}
> * 12:02:47,205 [ pool-126-thread-1] ERROR
> org.apache.flink.util.FatalExitExceptionHandler [] - FATAL:
> Thread 'pool-126-thread-1' produced an uncaught exception. Stopping the
> process...
> java.util.concurrent.CompletionException:
> java.util.concurrent.RejectedExecutionException: Task
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@38ce013a[Not
> completed, task =
> java.util.concurrent.Executors$RunnableAdapter@640a9cf7[Wrapped task =
> java.util.concurrent.>
> at
> java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:314)
> ~[?:?]
> at
> java.util.concurrent.CompletableFuture.uniHandleStage(CompletableFuture.java:951)
> ~[?:?]
> at
> java.util.concurrent.CompletableFuture.handleAsync(CompletableFuture.java:2282)
> ~[?:?]
> at
> org.apache.flink.runtime.resourcemanager.slotmanager.DefaultSlotStatusSyncer.allocateSlot(DefaultSlotStatusSyncer.java:138)
> ~[classes/:?]
> at
> org.apache.flink.runtime.resourcemanager.slotmanager.FineGrainedSlotManager.allocateSlotsAccordingTo(FineGrainedSlotManager.java:722)
> ~[classes/:?]
> at
> org.apache.flink.runtime.resourcemanager.slotmanager.FineGrainedSlotManager.checkResourceRequirements(FineGrainedSlotManager.java:645)
> ~[classes/:?]
> at
> org.apache.flink.runtime.resourcemanager.slotmanager.FineGrainedSlotManager.lambda$checkResourceRequirementsWithDelay$12(FineGrainedSlotManager.java:603)
> ~[classes/:?]
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
> at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
> [?:?]
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> [?:?]
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> [?:?]
> at java.lang.Thread.run(Thread.java:829) [?:?]
> Caused by: java.util.concurrent.RejectedExecutionException: Task
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@38ce013a[Not
> completed, task =
> java.util.concurrent.Executors$RunnableAdapter@640a9cf7[Wrapped task =
> java.util.concurrent.CompletableFuture$UniHandle@f3d>
> at
> java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2055)
> ~[?:?]
> at
> java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:825)
> ~[?:?]
> at
> java.util.concurrent.ScheduledThreadPoolExecutor.delayedExecute(ScheduledThreadPoolExecutor.java:340)
> ~[?:?]
> at
> java.util.concurrent.ScheduledThreadPoolExecutor.schedule(ScheduledThreadPoolExecutor.java:562)
> ~[?:?]
> at
> java.util.concurrent.ScheduledThreadPoolExecutor.execute(ScheduledThreadPoolExecutor.java:705)
> ~[?:?]
> at
> java.util.concurrent.Executors$DelegatedExecutorService.execute(Executors.java:687)
> ~[?:?]
> at
> java.util.concurrent.CompletableFuture.uniHandleStage(CompletableFuture.java:949)
> ~[?:?]
> ... 11 more{code}
> From the code, it looks like RM main thread executor was shut down, and that
> triggered JVM exit:
> {code:java}
> CompletableFuture<Acknowledge> requestFuture =
> gateway.requestSlot(
> SlotID.getDynamicSlotID(resourceId),
> jobId,
> allocationId,
> resourceProfile,
> targetAddress,
> resourceManagerId,
> taskManagerRequestTimeout);
> CompletableFuture<Void> returnedFuture = new CompletableFuture<>();
> FutureUtils.assertNoException(
> requestFuture.handleAsync(
> (Acknowledge acknowledge, Throwable throwable) -> {
> ... },
> mainThreadExecutor));
> {code}
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)