[
https://issues.apache.org/jira/browse/FLINK-22545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379857#comment-17379857
]
Stephan Ewen commented on FLINK-22545:
--------------------------------------
The exception that causes this crash is from a guard that aims to check that
not more than one Coordinator Thread is spawned and working on the mailbox. The
Coordinator Thread is using a Single Threaded Executor, so there should never
be more than one thread.
However, I think it can happen that the thread is terminated (if it was idle
for long) and then another thread gets spawned again.
In that case, the error we see would be thrown.
To fix that, we would need to perform the check for a previous thread in a
different way. I'll open a PR with a suggestion.
What puzzles me a bit is that this only occurs in Flink 1.12 and not in newer
versions. Maybe this is because of different timings?
> JVM crashes when runing
> OperatorEventSendingCheckpointITCase.testOperatorEventAckLost
> -------------------------------------------------------------------------------------
>
> Key: FLINK-22545
> URL: https://issues.apache.org/jira/browse/FLINK-22545
> Project: Flink
> Issue Type: Bug
> Components: Runtime / Coordination, Tests
> Affects Versions: 1.12.3
> Reporter: Guowei Ma
> Priority: Major
> Labels: auto-deprioritized-critical, test-stability
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=17501&view=logs&j=39d5b1d5-3b41-54dc-6458-1e2ddd1cdcf3&t=a99e99c7-21cd-5a1f-7274-585e62b72f56&l=4287
--
This message was sent by Atlassian Jira
(v8.3.4#803005)