gharris1727 commented on code in PR #18403:
URL: https://github.com/apache/kafka/pull/18403#discussion_r1907898181


##########
connect/runtime/src/test/java/org/apache/kafka/connect/runtime/isolation/TestPlugins.java:
##########
@@ -251,7 +252,11 @@ public enum TestPlugin {
         /**
          * A ServiceLoader discovered plugin which subclasses another plugin 
which is present on the classpath
          */
-        
SUBCLASS_OF_CLASSPATH_OVERRIDE_POLICY(TestPackage.SUBCLASS_OF_CLASSPATH, 
"test.plugins.SubclassOfClasspathOverridePolicy");
+        
SUBCLASS_OF_CLASSPATH_OVERRIDE_POLICY(TestPackage.SUBCLASS_OF_CLASSPATH, 
"test.plugins.SubclassOfClasspathOverridePolicy"),
+        /**
+         * A plugin which is part of the classpath by default. This packages 
it as a separate jar which is used to test plugin isolation from the classpath 
plugin.
+         */
+        CLASSPATH_CONVERTER(TestPackage.CLASSPATH_CONVERTER, 
"org.apache.kafka.connect.converters.ByteArrayConverter");

Review Comment:
   Mark this as not included by default.



##########
connect/runtime/src/main/java/org/apache/kafka/connect/runtime/isolation/ReflectionScanner.java:
##########
@@ -76,8 +78,21 @@ private static <T> String versionFor(Class<? extends T> 
pluginKlass) throws Refl
 
     @Override
     protected PluginScanResult scanPlugins(PluginSource source) {
+        // By default, java and classgraph uses parent first classloading, 
hence if a plugin is loaded by the classpath
+        // loader, and then by an isolated plugin loader, the default 
precedence will always load the classpath version.
+        // This breaks isolation and hence connect uses isolated plugin 
loaders, which are child first classloaders.
+        // Therefore, we override the classloader order to be child first, so 
that the isolated plugin loader is used first.
+        // In addition, we need to explicitly specify the full classloader 
order, as classgraph only scans the classes available
+        // in the classloaders and not the entire parent chain. Due to this 
reason if a plugin is extending a class present
+        // in classpath/application it will not be able to find the parent 
class unless we explicitly specify the classloader order.
+        List<ClassLoader> classLoaderOrder = new ArrayList<>();
+        ClassLoader cl = source.loader();
+        while (cl != null) {
+            classLoaderOrder.add(cl);
+            cl = cl.getParent();
+        }

Review Comment:
   But I can see that your reproduction case fails with the old code and 
succeeds with the new code, why is that?
   
   I can see after the initial `ClassGraph#scan()` call, it actually always 
finds the PluginClassLoader isolated copy of ByteArrayConverter, even with the 
trunk implementation. The difference happens inside of 
`ClassInfoList#loadClasses` and `ClassGraphClassLoader#loadClass`. Here's what 
I was seeing there:
   
   #### Trunk
   
   `environmentClassLoaderDelegationOrder`
   
   * PlatformClassLoader
   * **AppClassLoader**
   * FilteringClassLoader (?)
   * URLClassLoader (?)
   * **PluginClassLoader**
   
   #### Loader and parents
   
   `overrideClassLoaders`
   
   * **PluginClassLoader**
   * **AppClassLoader**
   * PlatformClassLoader
   
   So the order of the specified classloaders _does_ change the classloading 
order of the plugins, even though it doesn't change the scanning behavior.
   
   I think the current implementation is satisfactory as a workaround for the 
behavior, but we should follow-up with ClassGraph and try and use their 
ClassLoaderHandler infrastructure to get the right classloader sorting. I 
explored circumventing the ClassGraphClassLoader entirely and calling 
`Class.forName(..., ..., source.loader())`, but its a bit clunky because we 
have to also include a bunch of error handling.



##########
connect/runtime/src/main/java/org/apache/kafka/connect/runtime/isolation/ReflectionScanner.java:
##########
@@ -76,8 +78,21 @@ private static <T> String versionFor(Class<? extends T> 
pluginKlass) throws Refl
 
     @Override
     protected PluginScanResult scanPlugins(PluginSource source) {
+        // By default, java and classgraph uses parent first classloading, 
hence if a plugin is loaded by the classpath
+        // loader, and then by an isolated plugin loader, the default 
precedence will always load the classpath version.
+        // This breaks isolation and hence connect uses isolated plugin 
loaders, which are child first classloaders.
+        // Therefore, we override the classloader order to be child first, so 
that the isolated plugin loader is used first.
+        // In addition, we need to explicitly specify the full classloader 
order, as classgraph only scans the classes available
+        // in the classloaders and not the entire parent chain. Due to this 
reason if a plugin is extending a class present
+        // in classpath/application it will not be able to find the parent 
class unless we explicitly specify the classloader order.
+        List<ClassLoader> classLoaderOrder = new ArrayList<>();
+        ClassLoader cl = source.loader();
+        while (cl != null) {
+            classLoaderOrder.add(cl);
+            cl = cl.getParent();
+        }

Review Comment:
   Okay, thanks for the explanation, I see that ClassGraph is going outside the 
classloader hierarchy for scanning, and just uses a list of jars. Here's what I 
was seeing:
   
   #### Trunk
   
   ```
       .addClassLoader(source.loader())
   ```
   * Gradle wrapper
   * **Plugin**
   * Gradle worker
   * build/ class directories
   * **connect-runtime**
   * build/ lib jars
   * Gradle caches (dependencies)
   
   #### Single Loader
   
   ```
       .overrideClassLoaders(source.loader())
   ```
   * **Plugin**
   
   #### Loader and parents
   
   ```
       List<ClassLoader> classLoaderOrder = new ArrayList<>();
       ClassLoader cl = source.loader();
       while (cl != null) {
           classLoaderOrder.add(cl);
           cl = cl.getParent();
       }
       ...
           .overrideClassLoaders(classLoaderOrder.toArray(new ClassLoader[0]))
   ```
   * Runtime modules
   * **Plugin**
   * Gradle worker
   * build/ class directories
   * **connect-runtime**
   * build/ lib jars
   * Gradle caches (dependencies)
   
   It doesn't appear that the order in which we specify the classloaders has an 
effect on the order in which scanning actually takes place. And I think that 
makes sense given the ClassLoaderHandler stuff; It's applying a deterministic 
ordering to generate these lists of jars.
   
   Look at this: 
https://github.com/classgraph/classgraph/blob/6f9012f2a193ebfefe4a4384e7642820e7aab0f5/src/main/java/nonapi/io/github/classgraph/classloaderhandler/README
   
   > Note that URLClassLoader subclasses do not need a custom 
ClassLoaderHandler (unless they need to override the delegation order, as with 
SpringBootRestartClassLoaderHandler), URLClassLoader subclasses are handled 
automatically by ClassGraph.
   
   This sounds like it applies to us, because we have a child-first/parent-last 
delegation order, but we don't have a special handler telling ClassGraph about 
it. Maybe we can pursue upstreaming a handler to make the ordering work 
properly.



##########
connect/runtime/src/main/java/org/apache/kafka/connect/runtime/isolation/ReflectionScanner.java:
##########
@@ -76,8 +78,22 @@ private static <T> String versionFor(Class<? extends T> 
pluginKlass) throws Refl
 
     @Override
     protected PluginScanResult scanPlugins(PluginSource source) {
+        // By default, java and classgraph uses parent first classloading, 
hence if a plugin is loaded by the classpath
+        // loader, and then by an isolated plugin loader, the default 
precedence will always load the classpath version.
+        // This breaks isolation and hence connect uses isolated plugin 
loaders, which are child first classloaders.
+        // Therefore, we override the classloader order to be child first, so 
that the isolated plugin loader is used first.
+        // In addition, we need to explicitly specify the full classloader 
order, to force classgraph to scan classes in
+        // the class path. This does not happen by default as it uses 
reflections to obtain access to classpath URLs from
+        // the application classloader, which can fail with illegal access 
exceptions.
+        List<ClassLoader> classLoaderOrder = new ArrayList<>();
+        ClassLoader cl = source.loader();
+        while (cl != null) {
+            classLoaderOrder.add(cl);
+            cl = cl.getParent();
+        }
         ClassGraph classGraphBuilder = new ClassGraph()
-                .addClassLoader(source.loader())
+            .addClassLoader(source.loader())
+                .overrideClassLoaders(classLoaderOrder.toArray(new 
ClassLoader[0]))

Review Comment:
   nit: make this a private method that returns the classloader array.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to