Paul Rogers created DRILL-7593:
----------------------------------

             Summary: Standardize local paths
                 Key: DRILL-7593
                 URL: https://issues.apache.org/jira/browse/DRILL-7593
             Project: Apache Drill
          Issue Type: Improvement
    Affects Versions: 1.17.0
            Reporter: Paul Rogers


Discovered in the context of DRILL-7589 (PR #1987) is the idea of standardizing 
our set of local file system paths used when Drill runs in embedded mode. There 
may also be an opportunity to unify local file system paths used in distributed 
mode.

In distributed mode, we use ZK for distribution: all shared data must be in a 
location visible to all Drillbits: either ZK or a DFS. There is some need for 
local storage such as for UDF staging and for spill files.

In local mode, all persistent storage occurs on the local file system; there is 
no ZK and there is no need to coordinate a set of Drillbits.

At present, the local paths are spread all over the config system. Code that 
wants to set up local paths (such as {{DirTestWatcher}}) must handle each 
directory specially. Then, either {{ClusterFixture}} or a unit test must set 
the proper config property to match the directory selection.

For example, from {{drill-module.conf}}:

{noformat}
drill.tmp-dir: "/tmp"
drill.tmp-dir: ${?DRILL_TMP_DIR}
...
 sys.store.provider: {
 local: {
 path: "/tmp/drill",
 }
 trace: {
 directory: "/tmp/drill-trace",
 filesystem: "file:///"
 },
 tmp: {
 directories: ["/tmp/drill"],
 filesystem: "drill-local:///"
 },
 compile: {
 code_dir: "/tmp/drill/codegen"
...
 spill: {
 // *** Options common to all the operators that may spill
 // File system to use. Local file system by default.
 fs: "file:///",
 // List of directories to use. Directories are created
 // if they do not exist.
 directories: [ "/tmp/drill/spill" ]
...
 udf: {
 directory: {
 // Base directory for remote and local udf directories, unique among clusters.
{noformat}

And probably more. To move where Drill stores temp files, the user must change 
all of these properties.

Fortunately, [~arina] did a nice job with the UDF directories: they all are 
computed from the base directory:

{noformat}
 directory: {
 // Base directory for remote and local udf directories, unique among clusters.
 base: ${drill.exec.zk.root}"/udf",

// Path to local udf directory, always created on local file system.
 // Root for these directory is generated at runtime unless Drill temporary 
directory is set.
 local: ${drill.exec.udf.directory.base}"/udf/local",

// Set this property if custom file system should be used to create remote 
directories, ex: fs: "file:///".
 // fs: "",
 // Set this property if custom absolute root should be used for remote 
directories, ex: root: "/app/drill".
 // root: "",

// Relative path to all remote udf directories.
 // Directories are created under default file system taken from Hadoop 
configuration
 // unless ${drill.exec.udf.directory.fs} is set.
 // User home directory is used as root unless ${drill.exec.udf.directory.root} 
is set.
 staging: ${drill.exec.udf.directory.base}"/staging",
 registry: ${drill.exec.udf.directory.base}"/registry",
 tmp: ${drill.exec.udf.directory.base}"/tmp"
 }
{noformat}

So, can we do the same thing for all the other local directories? Allow each to 
be custom-set, but default them to be computed from a single base directory. 
That way, if a unit test or install wants to move the Drill local directories 
to, say, {{/var/drill/tmp}}, they only need change a single config line and 
everything else follows automatically.

This can be done in the existing conf file as was done for UDFs. And, I guess 
to preserve compatibility, we'd have to leave the properties where they are; 
we'd just change their values.

This ticket asks to:

* Work out a good solution.
* Implement it in the config system
* Scrub the unit tests and {{DirTestWatcher}} to determine where we can 
simplify code by reusing this solution rather than ad-hoc, per directory 
configs.
* Modify {{DirTestWatcher}} to coordinate with the config system: Set the base 
directory in config, then use the configured paths for each of the persistent 
store, profile, UDF and other directories.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to