[ 
https://issues.apache.org/jira/browse/FLINK-23147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17369306#comment-17369306
 ] 

Chesnay Schepler edited comment on FLINK-23147 at 6/25/21, 7:38 AM:
--------------------------------------------------------------------

??I was assuming that ExecutorThreadFactory is internal and is only used by the 
system.??

It is.

??I gathered that your intent is to extract the AkkaRpcService (the only akka 
related class that uses the factory) into a plugin???

yes-ish; it's not a conventional plugin (because those aren't supported by the 
MiniCluster, duh) but it uses a {{PluginLoader}}.

??If something like AkkaRpcService lives in a plugin, why do we want it to 
outlive the plugin life cycle???

We don't. Without relying on some static singleton, in order to work with the 
MiniCluster we need to be able to shut it down when the MC terminates. IOW, we 
actually shorten the plugin life cycle, which are currently bound to the 
"lifecycle" of the JVM. And we need the MiniCluster to load it as a plugin 
because otherwise we have 2 different scala versions on the CP wreaking havoc.

??In general, I don't feel comfortable with letting plugins use the flink CL as 
they seem fit, the main purpose is to encapsulate user-code with potentially 
conflicting libraries..??

It's a hard requirement in this case because the akka rpc service must interact 
with Flink. We don't just fire events into the RPC service never to be seen 
again, but they eventually come out again and thus call info Flink classes.

??I could see that we add a special type of plugins that can break free of the 
restrictions that doesn't encapsulate user-code but system code extracted to 
live in a different CL. However, at this point, I don't see where the benefit 
is over just putting it into lib if a plugin's classes are loaded through flink 
CL anyways. Maybe, we rather need plugins that are just completely bound to the 
live cycle of the Flink cluster (aren't they anyways outliving jobs?).??

TBH I don't see what this has got to do with anything or where you're going 
with it.


Here's essentially what happens:
Flink retrieves an executor from the akka rpc service, schedules a runnable, 
and gets a future back.
This executor is effectively the akka actor system, for which akka always sets 
the context class loader.
Flink then applies further operations on this future, like scheduling stuff 
into another pool, which will also be run in the same thread. When this 
operation now runs it may create a new thread in the pool, setting the CCL of 
the thread to the Plugin CL, leaking it.





was (Author: zentol):
??I was assuming that ExecutorThreadFactory is internal and is only used by the 
system.??

It is.

??I gathered that your intent is to extract the AkkaRpcService (the only akka 
related class that uses the factory) into a plugin???

yes-ish; it's not a conventional plugin (because those aren't supported by the 
MiniCluster, duh) but it uses a {{PluginLoader}}.

??If something like AkkaRpcService lives in a plugin, why do we want it to 
outlive the plugin life cycle???

We don't. Without relying on some static singleton, in order to work with the 
MiniCluster we need to be able to shut it down when the MC terminates. IOW, we 
actually shorten the plugin life cycle, which are currently bound to the 
"lifecycle" of the JVM. And we need the MiniCluster to load it as a plugin 
because otherwise we have 2 different scala versions on the CP wreaking havoc.

??In general, I don't feel comfortable with letting plugins use the flink CL as 
they seem fit, the main purpose is to encapsulate user-code with potentially 
conflicting libraries.. ??

It's a hard requirement in this case because the akka rpc service must interact 
with Flink. We don't just fire events into the RPC service never to be seen 
again, but they eventually come out again and thus call info Flink classes.

??I could see that we add a special type of plugins that can break free of the 
restrictions that doesn't encapsulate user-code but system code extracted to 
live in a different CL. However, at this point, I don't see where the benefit 
is over just putting it into lib if a plugin's classes are loaded through flink 
CL anyways. Maybe, we rather need plugins that are just completely bound to the 
live cycle of the Flink cluster (aren't they anyways outliving jobs?).??

TBH I don't see what this has got to do with anything or where you're going 
with it.


Here's essentially what happens:
Flink retrieves an executor from the akka rpc service, schedules a runnable, 
and gets a future back.
This executor is effectively the akka actor system, for which akka always sets 
the context class loader.
Flink then applies further operations on this future, like scheduling stuff 
into another pool, which will also be run in the same thread. When this 
operation now runs it may create a new thread in the pool, setting the CCL of 
the thread to the Plugin CL, leaking it.




> ThreadPools can be poisoned by context class loaders
> ----------------------------------------------------
>
>                 Key: FLINK-23147
>                 URL: https://issues.apache.org/jira/browse/FLINK-23147
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / Coordination
>            Reporter: Chesnay Schepler
>            Priority: Major
>             Fix For: 1.14.0
>
>
> Newly created threads in a thread pool inherit the context class loader (CCL) 
> of the currently running thread.
> For thread pools this is very problematic because the CCL is unlikely to be 
> reset at any point; not only can this leak another CL by accident, it can 
> also cause class loading issues, for example when using a {{ServiceLoader}} 
> because it relies on the CCL.
> With the scala-free runtime this for example means that if an actor system 
> threads schedules something into future thread pool of the JM then a new 
> thread is created which uses a plugin loader as a CCL. The plugin loaders are 
> quite restrictive and prevent the loading of 3rd-party dependencies; so if 
> the JM schedules something into the future thread pool which requires one of 
> these dependencies to be accessible then we're gambling as to whether this 
> dependency can actually be loaded in the end.
> Because it is difficult to ensure that we set the CCL correctly on all 
> transitions from akka to Flink land I suggest to add a safeguard to the 
> ExecutorThreadFactory to enforce that newly created threads are always 
> initialized with the CL that has loaded Flink.
> /cc [~arvid] [~sewen]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to