[
https://issues.apache.org/jira/browse/FLINK-18352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Kostas Kloudas closed FLINK-18352.
----------------------------------
Resolution: Fixed
Fixed on master with b05ab524836c1c07d128610248fe372c8d697974
on release-1.11 with eeeff7a5fa8ec0acb67bac8eb7dbd9a4165a91e7
and on release-1.10 with dcd7574daccfbbacf99d101a3f0f852e332e92a7
> org.apache.flink.core.execution.DefaultExecutorServiceLoader not thread safe
> ----------------------------------------------------------------------------
>
> Key: FLINK-18352
> URL: https://issues.apache.org/jira/browse/FLINK-18352
> Project: Flink
> Issue Type: Bug
> Components: Client / Job Submission
> Affects Versions: 1.10.0
> Reporter: Marcos Klein
> Assignee: Kostas Kloudas
> Priority: Major
> Labels: pull-request-available
> Fix For: 1.11.0, 1.10.2, 1.12.0
>
>
> The singleton nature of the
> *org.apache.flink.core.execution.DefaultExecutorServiceLoader* class is not
> thread-safe due to the fact that *java.util.ServiceLoader* class is not
> thread-safe.
> [https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/ServiceLoader.html#Concurrency]
>
> This can result in *ServiceLoader* class entering into an inconsistent state
> for processes which attempt to self-heal. This then requires bouncing the
> process/container in the hopes the race condition does not re-occur.
> [https://stackoverflow.com/questions/60391499/apache-flink-cannot-find-compatible-factory-for-specified-execution-target-lo]
>
> Additionally the following stack traces have been seen when using a
> *org.apache.flink.streaming.api.environment.RemoteStreamEnvironment*
> instances.
> {code:java}
> java.lang.ArrayIndexOutOfBoundsException: 2
> at sun.misc.CompoundEnumeration.nextElement(CompoundEnumeration.java:61)
> at
> java.util.ServiceLoader$LazyIterator.hasNextService(ServiceLoader.java:357)
> at java.util.ServiceLoader$LazyIterator.hasNext(ServiceLoader.java:393)
> at java.util.ServiceLoader$1.hasNext(ServiceLoader.java:474)
> at
> org.apache.flink.core.execution.DefaultExecutorServiceLoader.getExecutorFactory(DefaultExecutorServiceLoader.java:60)
> at
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:1724)
> at
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:1706)
> {code}
>
> {code:java}
> java.util.NoSuchElementException: null
> at sun.misc.CompoundEnumeration.nextElement(CompoundEnumeration.java:59)
> at
> java.util.ServiceLoader$LazyIterator.hasNextService(ServiceLoader.java:357)
> at java.util.ServiceLoader$LazyIterator.hasNext(ServiceLoader.java:393)
> at java.util.ServiceLoader$1.hasNext(ServiceLoader.java:474)
> at
> org.apache.flink.core.execution.DefaultExecutorServiceLoader.getExecutorFactory(DefaultExecutorServiceLoader.java:60)
> at
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:1724)
> at
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:1706)
> {code}
> The workaround for using the ***StreamExecutionEnvironment* implementations
> is to write a custom implementation of *DefaultExecutorServiceLoader* which
> is thread-safe and pass that to the *StreamExecutionEnvironment* constructors.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)