Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/574#discussion_r80547641
  
    --- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/FunctionImplementationRegistry.java
 ---
    @@ -301,29 +323,120 @@ private ScanResult scan(ClassLoader classLoader, 
Path path, URL[] urls) throws I
             return RunTimeScan.dynamicPackageScan(drillConfig, 
Sets.newHashSet(urls));
           }
         }
    -    throw new FunctionValidationException(String.format("Marker file %s is 
missing in %s.",
    +    throw new JarValidationException(String.format("Marker file %s is 
missing in %s",
             CommonConstants.DRILL_JAR_MARKER_FILE_RESOURCE_PATHNAME, 
path.getName()));
       }
     
    -  private static String getUdfDir() {
    -    return Preconditions.checkNotNull(System.getenv("DRILL_UDF_DIR"), 
"DRILL_UDF_DIR variable is not set");
    +  /**
    +   * Return list of jars that are missing in local function registry
    +   * but present in remote function registry.
    +   *
    +   * @param remoteFunctionRegistry remote function registry
    +   * @param localFunctionRegistry local function registry
    +   * @return list of missing jars
    +   */
    +  private List<String> getMissingJars(RemoteFunctionRegistry 
remoteFunctionRegistry,
    +                                      LocalFunctionRegistry 
localFunctionRegistry) {
    +    List<Jar> remoteJars = 
remoteFunctionRegistry.getRegistry().getJarList();
    +    List<String> localJars = localFunctionRegistry.getAllJarNames();
    +    List<String> missingJars = Lists.newArrayList();
    +    for (Jar jar : remoteJars) {
    +      if (!localJars.contains(jar.getName())) {
    +        missingJars.add(jar.getName());
    +      }
    +    }
    +    return missingJars;
    +  }
    +
    +  /**
    +   * Creates local udf directory, if it doesn't exist.
    +   * Checks if local is a directory and if current application has write 
rights on it.
    +   * Attempts to clean up local idf directory in case jars were left after 
previous drillbit run.
    +   *
    +   * @return path to local udf directory
    +   */
    +  private Path getLocalUdfDir() {
    +    String confDir = getConfDir();
    --- End diff --
    
    Unfortunately, this won't work in the case of Drill-on-YARN. The 
$DRILL_HOME and $DRILL_CONF_DIR directories are read-only in that case.
    
    The new site directory (pointed to by DRILL_CONF_DIR) will contain a "jars" 
directory that contains statically-defined UDFs. In Drill-on-YARN, YARN copies 
all of the site directory to the local machine, but makes it read-only so that 
YARN can reuse that same "localized" copy for multiple runs. (That feature is 
handy fo map/reduce, but is not that useful for Drill. Still, that's how YARN 
works...)
    
    One solution: provide a config option that specifies the local UDF 
location. The Apache Drill default can be the config dir (assuming there is a 
way to reference the config dir from within drill-override.conf -- need to 
check that.) For DoY, we will change the location to be a temp directory 
location provided by YARN.
    
    Using the YARN temp directory ensures that the local udf dir starts out 
empty on each run. But, what about the "stock" Drill case? The 
$DRILL_CONFIG_DIR/udf directory probably will contain jars from a previous run. 
Is this desired? Does the code handle this case? Do we clean out UDFs that were 
dropped while the Drillbit was offline? Do we handle a partially-downloaded jar 
that was left incomplete when the previous run crashed?
    
    Or, would it be better to clear the udf directory on the start of each 
Drill run? If we do that, can we always write udfs to a temp directory? Perhaps 
review the temp directories available.
    
    Since DoY defines the temp directory at runtime, we need to set the temp 
diretory in drill-config.sh (which you did in a previous version.) As it turns 
out, Drill already has temp directories set in the config system (for 
spill-to-disk.) So we need to reconcile these two.
    
    Perhaps this:
    
    Define DRILL_TEMP_DIR in drill-config.sh. If it is set in the environment 
(the DoY case) or drill-env.sh (the non-DoY case), use it. Else, default to 
/tmp.
    
    Under DoY, we can run multiple drillbits on the same host (by changing 
ports, etc.) So we need a unique path. Define the actual Drillbit temp 
directory to be
    
    drillbit-temp-dir = $DRILL_TEMP_DIR/${drill-root}-${cluster-id}
    
    We need both the root and cluster ID because neither is unique by itself, 
unfortunately.
    
    Finally, udfs can reside in ${drillbit-temp-dir}/udf
    
    This is just one possibility to illustrate the issue. Feel free to create a 
better solution.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to