Nico Kruber created FLINK-25022:
-----------------------------------

             Summary: ClassLoader leak with ThreadLocals on the JM when 
submitting a job through the REST API
                 Key: FLINK-25022
                 URL: https://issues.apache.org/jira/browse/FLINK-25022
             Project: Flink
          Issue Type: Bug
          Components: Runtime / REST
    Affects Versions: 1.13.3, 1.12.5, 1.14.0
            Reporter: Nico Kruber


If a job is submitted using the REST API's {{/jars/:jarid/run}} endpoint, user 
code has to be executed on the JobManager and it is doing this in a couple of 
(pooled) dispatcher threads like {{{}Flink-DispatcherRestEndpoint-thread-*{}}}.

If the user code is using thread locals (and not cleaning them up), they may 
remain in the thread with references to the {{ChildFirstClassloader}} of the 
job and thus leaking that.

We saw this for the {{jsoniter}} scala library at the JM which [creates 
ThreadLocal 
instances|https://github.com/plokhotnyuk/jsoniter-scala/blob/95c7053cfaa558877911f3448382f10d53c4fcbf/jsoniter-scala-core/jvm/src/main/scala/com/github/plokhotnyuk/jsoniter_scala/core/package.scala]
 but doesn't remove them, but it can actually happen with any user code or 
(worse) library used in user code.

 

There are a few *workarounds* a user can use, e.g. putting the library in 
Flink's lib/ folder or submitting via the Flink CLI, but these may actually not 
be possible to use, depending on the circumstances.

 

A *proper fix* should happen in Flink by guarding against any of these things 
in the dispatcher threads. We could, for example, spawn a separate thread for 
executing the user's {{main()}} method and once the job is submitted exit that 
thread and destroy all thread locals along with it.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to