zentol commented on a change in pull request #11303: [FLINK-16245] Decoupling
user classloader from context classloader.
URL: https://github.com/apache/flink/pull/11303#discussion_r393648354
##########
File path:
flink-runtime/src/main/java/org/apache/flink/runtime/execution/librarycache/FlinkUserCodeClassLoaders.java
##########
@@ -82,4 +89,61 @@ public static ResolveOrder fromString(String resolveOrder) {
super(urls, parent);
}
}
+
+ /**
+ * Ensures that holding a reference on the context class loader
outliving the scope of user code does not prevent
+ * the user classloader to be garbage collected (FLINK-16245).
+ *
+ * <p>This classloader delegates to the actual user classloader. Upon
{@link #close()}, the delegate is nulled
+ * and can be garbage collected. Additional class resolution will be
resolved solely through the bootstrap
+ * classloader and most likely result in ClassNotFound exceptions.
+ */
+ private static class SafetyNetWrapperClassLoader extends URLClassLoader
+ implements Closeable {
+ private static final Logger LOG =
LoggerFactory.getLogger(SafetyNetWrapperClassLoader.class);
+
+ private FlinkUserCodeClassLoader inner;
+
+ SafetyNetWrapperClassLoader(FlinkUserCodeClassLoader inner) {
+ super(new URL[0], null);
+ this.inner = inner;
+ }
+
+ @Override
+ public void close() {
+ if (inner != null) {
+ try {
+ inner.close();
+ } catch (IOException e) {
+ LOG.warn("Could not close user
classloader", e);
+ }
+ }
+ inner = null;
+ }
+
+ @Override
+ protected Class<?> loadClass(String name, boolean resolve)
throws ClassNotFoundException {
+ if (inner == null) {
+ return super.loadClass(name, resolve);
Review comment:
> I thought that's what you meant with fail early.
No, with fail early I meant that the (leaked) thread with the leaked
classloader reference would crash ASAP if it tries to load a class when the CL
was closed.
> Now, I just need to figure out how to make sure that this exception will
actually let TM fail.
I don't see why we should do this.
FLINK-16225 aims to have the TM kill itself on a Metaspace OOM which is
fine, but I find no reference anywhere that we should always fail the TM at the
slightest hint of a leaked thread and/or classloader, which in any case would
be out-of-scope of this PR?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services