[GitHub] [druid] paul-rogers commented on a diff in pull request #12816: Tidy up construction of the Guice Injectors

GitBox Mon, 01 Aug 2022 17:52:14 -0700


paul-rogers commented on code in PR #12816:
URL: https://github.com/apache/druid/pull/12816#discussion_r935019243



##########
server/src/test/java/org/apache/druid/initialization/ServerInjectorBuilderTest.java:
##########
@@ -0,0 +1,285 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.initialization;
+
+import com.fasterxml.jackson.databind.Module;
+import com.google.common.base.Function;
+import com.google.common.collect.Collections2;
+import com.google.common.collect.ImmutableList;
+import com.google.common.collect.ImmutableSet;
+import com.google.inject.Binder;
+import com.google.inject.ConfigurationException;
+import com.google.inject.Inject;
+import com.google.inject.Injector;
+import com.google.inject.Key;
+import com.google.inject.name.Names;
+import org.apache.druid.discovery.NodeRole;
+import org.apache.druid.guice.ExtensionsLoader;
+import org.apache.druid.guice.JsonConfigProvider;
+import org.apache.druid.guice.StartupInjectorBuilder;
+import org.apache.druid.guice.annotations.LoadScope;
+import org.apache.druid.guice.annotations.Self;
+import org.apache.druid.server.DruidNode;
+import org.junit.Assert;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.rules.TemporaryFolder;
+
+import javax.annotation.Nullable;
+
+import java.util.Collection;
+import java.util.List;
+import java.util.Set;
+
+// See ExtensionsLoaderTest for tests formerly in this file related

Review Comment:
   Good point. The comment was from when the file had its original name. Since 
that name is gone, folks won't know to look for this one, and so the comment is 
silly. Removed.



##########
indexing-service/src/test/java/org/apache/druid/indexing/common/task/batch/parallel/MultiPhaseParallelIndexingRowStatsTest.java:
##########
@@ -107,6 +108,7 @@ public void testHashPartitionRowStats()
   }
 
   @Test
+  @Ignore("assumes record rates, to be fixed in separate PR")

Review Comment:
   Sorry, cryptic. This test failed repeatedly in this PR. Discussions with 
other engineers revealed that the test itself is flawed: it assumes the test 
duration. Someone else is fixing that. This temporary hack exists just so that 
this PR will build. I'll check if the required fix has been submitted yet, and 
if so, will remove this annotation.



##########
processing/src/main/java/org/apache/druid/guice/ExtensionsLoader.java:
##########
@@ -0,0 +1,328 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.guice;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.inject.Injector;
+import org.apache.commons.io.FileUtils;
+import org.apache.druid.initialization.DruidModule;
+import org.apache.druid.java.util.common.ISE;
+import org.apache.druid.java.util.common.logger.Logger;
+
+import javax.inject.Inject;
+
+import java.io.File;
+import java.io.FilenameFilter;
+import java.io.IOException;
+import java.net.MalformedURLException;
+import java.net.URL;
+import java.net.URLClassLoader;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.HashSet;
+import java.util.LinkedHashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.ServiceLoader;
+import java.util.Set;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.stream.Collectors;
+
+public class ExtensionsLoader
+{
+  private static final Logger log = new Logger(ExtensionsLoader.class);
+
+  private final ExtensionsConfig extensionsConfig;
+  private final ConcurrentHashMap<File, URLClassLoader> loaders = new 
ConcurrentHashMap<>();
+
+  /**
+   * Map of loaded extensions, keyed by class (or interface).
+   */
+  private final ConcurrentHashMap<Class<?>, Collection<?>> extensions = new 
ConcurrentHashMap<>();
+
+  @Inject
+  public ExtensionsLoader(ExtensionsConfig config)
+  {
+    this.extensionsConfig = config;
+  }
+
+  public static ExtensionsLoader instance(Injector injector)
+  {
+    return injector.getInstance(ExtensionsLoader.class);
+  }
+
+  public ExtensionsConfig config()
+  {
+    return extensionsConfig;
+  }
+
+  /**
+   * @param clazz service class
+   * @param <T>   the service type
+   *
+   * @return Returns a collection of implementations loaded.
+   */
+  public <T> Collection<T> getLoadedImplementations(Class<T> clazz)
+  {
+    @SuppressWarnings("unchecked")
+    Collection<T> retVal = (Collection<T>) extensions.get(clazz);
+    if (retVal == null) {
+      return new HashSet<>();
+    }
+    return retVal;
+  }
+
+  /**
+   * @return Returns a collection of implementations loaded.
+   */
+  public Collection<DruidModule> getLoadedModules()
+  {
+    return getLoadedImplementations(DruidModule.class);
+  }
+
+  @VisibleForTesting
+  public Map<File, URLClassLoader> getLoadersMap()
+  {
+    return loaders;
+  }
+
+  /**
+   * Look for implementations for the given class from both classpath and 
extensions directory, using {@link
+   * ServiceLoader}. A user should never put the same two extensions in 
classpath and extensions directory, if he/she
+   * does that, the one that is in the classpath will be loaded, the other 
will be ignored.
+   *
+   * @param serviceClass The class to look the implementations of (e.g., 
DruidModule)
+   *
+   * @return A collection that contains implementations (of distinct concrete 
classes) of the given class. The order of
+   * elements in the returned collection is not specified and not guaranteed 
to be the same for different calls to
+   * getFromExtensions().
+   */
+  @SuppressWarnings("unchecked")
+  public <T> Collection<T> getFromExtensions(Class<T> serviceClass)
+  {
+    // Classes classes are loaded once upon first request. Since the class 
path does
+    // not change during a run, the set of extension classes cannot change once
+    // computed.
+    //
+    // In practice, it appears the only place this matters is with DruidModule:
+    // initialization gets the list of extensions, and two REST API calls later
+    // ask for the same list.
+    Collection<?> modules = extensions.computeIfAbsent(
+        serviceClass,
+        serviceC -> new ServiceLoadingFromExtensions<>(serviceC).implsToLoad
+    );
+    //noinspection unchecked
+    return (Collection<T>) modules;
+  }
+
+  public Collection<DruidModule> getModules()
+  {
+    return getFromExtensions(DruidModule.class);
+  }
+
+  /**
+   * Find all the extension files that should be loaded by druid.
+   * <p/>
+   * If user explicitly specifies druid.extensions.loadList, then it will look 
for those extensions under root
+   * extensions directory. If one of them is not found, druid will fail loudly.
+   * <p/>
+   * If user doesn't specify druid.extension.toLoad (or its value is empty), 
druid will load all the extensions
+   * under the root extensions directory.
+   *
+   * @return an array of druid extension files that will be loaded by druid 
process
+   */
+  public File[] getExtensionFilesToLoad()
+  {
+    final File rootExtensionsDir = new File(extensionsConfig.getDirectory());
+    if (rootExtensionsDir.exists() && !rootExtensionsDir.isDirectory()) {
+      throw new ISE("Root extensions directory [%s] is not a directory!?", 
rootExtensionsDir);
+    }
+    File[] extensionsToLoad;
+    final LinkedHashSet<String> toLoad = extensionsConfig.getLoadList();
+    if (toLoad == null) {
+      extensionsToLoad = rootExtensionsDir.listFiles();
+    } else {
+      int i = 0;
+      extensionsToLoad = new File[toLoad.size()];
+      for (final String extensionName : toLoad) {
+        File extensionDir = new File(extensionName);
+        if (!extensionDir.isAbsolute()) {
+          extensionDir = new File(rootExtensionsDir, extensionName);
+        }
+
+        if (!extensionDir.isDirectory()) {
+          throw new ISE(
+              "Extension [%s] specified in \"druid.extensions.loadList\" 
didn't exist!?",
+              extensionDir.getAbsolutePath()
+          );
+        }
+        extensionsToLoad[i++] = extensionDir;
+      }
+    }
+    return extensionsToLoad == null ? new File[]{} : extensionsToLoad;
+  }
+
+  /**
+   * @param extension The File instance of the extension we want to load
+   *
+   * @return a URLClassLoader that loads all the jars on which the extension 
is dependent
+   */
+  public URLClassLoader getClassLoaderForExtension(File extension, boolean 
useExtensionClassloaderFirst)
+  {
+    return loaders.computeIfAbsent(

Review Comment:
   Good question. I admit I didn't dive into the details when doing 
refactoring: I kept the original code and moved it around.
   
   Taking a closer look now, we seem to have three ways that we load 
extensions: one for the Druid servers, another for `HadoopTask` and a third for 
`CliHadoopIndexer`. The server uses the `extensionsConfig`, the `HadoopTask` 
and `CliHadoopIndexer` versions always pass `false`. Why? There is no comment 
to indicate why.
   
   There is, however, a comment that says that we want to load Hadoop first (in 
`HadoopTask`), but last (in `CliHadoopIndexer`). Modules now operate in three 
distinct modes:
   
   * Extension-classloader first (configurable at run time)
   * Non extension-classloader first (Hadoop)
     * Hadoop extensions first (`HadoopTask`)
     * Hadoop extensions last (`CliHadoopIndexer`)
   
   I can't imagine this is a _feature_. It seems more like we must have run 
into bugs, and provided these various options to work around issues. It seems 
impossible for a user (or me) to predict what will break by reversing 
classloader search order: sometime a class reference resolves to one 
implementation, sometimes another. And, modules that use Hadoop must know that 
sometimes Hadoop is available, sometimes not.
   
   I will say that, looking at the code, we don't mix the Hadoop and server use 
cases: we use one classloader or the other, but we don't seem to mix the two in 
the same run (he says without running Hadoop and setting breakpoints.)
   
   Since what we have seems to work, and Hadoop is not a blocker to the test 
and single-server goals that this PR supports, and I'm not in a position to 
debug Hadoop, I'm inclined to leave well-enough alone.
   
   Or, is there something here we need to address?



##########
core/src/main/java/org/apache/druid/initialization/DruidModule.java:
##########
@@ -25,6 +25,13 @@
 import java.util.List;
 
 /**
+ * A Guice module which also provides Jackson modules.

Review Comment:
   If the hardest thing in CS is naming, the second hardest is to write 
comments...
   
   Bingo on the default implementation for `getJacksonModules()`. That 
simplifies things quite a bit.
   
   The comment was revised accordingly.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [druid] paul-rogers commented on a diff in pull request #12816: Tidy up construction of the Guice Injectors

Reply via email to