Github user paul-rogers commented on a diff in the pull request:
https://github.com/apache/drill/pull/574#discussion_r80547641
--- Diff:
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/FunctionImplementationRegistry.java
---
@@ -301,29 +323,120 @@ private ScanResult scan(ClassLoader classLoader,
Path path, URL[] urls) throws I
return RunTimeScan.dynamicPackageScan(drillConfig,
Sets.newHashSet(urls));
}
}
- throw new FunctionValidationException(String.format("Marker file %s is
missing in %s.",
+ throw new JarValidationException(String.format("Marker file %s is
missing in %s",
CommonConstants.DRILL_JAR_MARKER_FILE_RESOURCE_PATHNAME,
path.getName()));
}
- private static String getUdfDir() {
- return Preconditions.checkNotNull(System.getenv("DRILL_UDF_DIR"),
"DRILL_UDF_DIR variable is not set");
+ /**
+ * Return list of jars that are missing in local function registry
+ * but present in remote function registry.
+ *
+ * @param remoteFunctionRegistry remote function registry
+ * @param localFunctionRegistry local function registry
+ * @return list of missing jars
+ */
+ private List<String> getMissingJars(RemoteFunctionRegistry
remoteFunctionRegistry,
+ LocalFunctionRegistry
localFunctionRegistry) {
+ List<Jar> remoteJars =
remoteFunctionRegistry.getRegistry().getJarList();
+ List<String> localJars = localFunctionRegistry.getAllJarNames();
+ List<String> missingJars = Lists.newArrayList();
+ for (Jar jar : remoteJars) {
+ if (!localJars.contains(jar.getName())) {
+ missingJars.add(jar.getName());
+ }
+ }
+ return missingJars;
+ }
+
+ /**
+ * Creates local udf directory, if it doesn't exist.
+ * Checks if local is a directory and if current application has write
rights on it.
+ * Attempts to clean up local idf directory in case jars were left after
previous drillbit run.
+ *
+ * @return path to local udf directory
+ */
+ private Path getLocalUdfDir() {
+ String confDir = getConfDir();
--- End diff --
Unfortunately, this won't work in the case of Drill-on-YARN. The
$DRILL_HOME and $DRILL_CONF_DIR directories are read-only in that case.
The new site directory (pointed to by DRILL_CONF_DIR) will contain a "jars"
directory that contains statically-defined UDFs. In Drill-on-YARN, YARN copies
all of the site directory to the local machine, but makes it read-only so that
YARN can reuse that same "localized" copy for multiple runs. (That feature is
handy fo map/reduce, but is not that useful for Drill. Still, that's how YARN
works...)
One solution: provide a config option that specifies the local UDF
location. The Apache Drill default can be the config dir (assuming there is a
way to reference the config dir from within drill-override.conf -- need to
check that.) For DoY, we will change the location to be a temp directory
location provided by YARN.
Using the YARN temp directory ensures that the local udf dir starts out
empty on each run. But, what about the "stock" Drill case? The
$DRILL_CONFIG_DIR/udf directory probably will contain jars from a previous run.
Is this desired? Does the code handle this case? Do we clean out UDFs that were
dropped while the Drillbit was offline? Do we handle a partially-downloaded jar
that was left incomplete when the previous run crashed?
Or, would it be better to clear the udf directory on the start of each
Drill run? If we do that, can we always write udfs to a temp directory? Perhaps
review the temp directories available.
Since DoY defines the temp directory at runtime, we need to set the temp
diretory in drill-config.sh (which you did in a previous version.) As it turns
out, Drill already has temp directories set in the config system (for
spill-to-disk.) So we need to reconcile these two.
Perhaps this:
Define DRILL_TEMP_DIR in drill-config.sh. If it is set in the environment
(the DoY case) or drill-env.sh (the non-DoY case), use it. Else, default to
/tmp.
Under DoY, we can run multiple drillbits on the same host (by changing
ports, etc.) So we need a unique path. Define the actual Drillbit temp
directory to be
drillbit-temp-dir = $DRILL_TEMP_DIR/${drill-root}-${cluster-id}
We need both the root and cluster ID because neither is unique by itself,
unfortunately.
Finally, udfs can reside in ${drillbit-temp-dir}/udf
This is just one possibility to illustrate the issue. Feel free to create a
better solution.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---