Paul Rogers created DRILL-7593:
----------------------------------
Summary: Standardize local paths
Key: DRILL-7593
URL: https://issues.apache.org/jira/browse/DRILL-7593
Project: Apache Drill
Issue Type: Improvement
Affects Versions: 1.17.0
Reporter: Paul Rogers
Discovered in the context of DRILL-7589 (PR #1987) is the idea of standardizing
our set of local file system paths used when Drill runs in embedded mode. There
may also be an opportunity to unify local file system paths used in distributed
mode.
In distributed mode, we use ZK for distribution: all shared data must be in a
location visible to all Drillbits: either ZK or a DFS. There is some need for
local storage such as for UDF staging and for spill files.
In local mode, all persistent storage occurs on the local file system; there is
no ZK and there is no need to coordinate a set of Drillbits.
At present, the local paths are spread all over the config system. Code that
wants to set up local paths (such as {{DirTestWatcher}}) must handle each
directory specially. Then, either {{ClusterFixture}} or a unit test must set
the proper config property to match the directory selection.
For example, from {{drill-module.conf}}:
{noformat}
drill.tmp-dir: "/tmp"
drill.tmp-dir: ${?DRILL_TMP_DIR}
...
sys.store.provider: {
local: {
path: "/tmp/drill",
}
trace: {
directory: "/tmp/drill-trace",
filesystem: "file:///"
},
tmp: {
directories: ["/tmp/drill"],
filesystem: "drill-local:///"
},
compile: {
code_dir: "/tmp/drill/codegen"
...
spill: {
// *** Options common to all the operators that may spill
// File system to use. Local file system by default.
fs: "file:///",
// List of directories to use. Directories are created
// if they do not exist.
directories: [ "/tmp/drill/spill" ]
...
udf: {
directory: {
// Base directory for remote and local udf directories, unique among clusters.
{noformat}
And probably more. To move where Drill stores temp files, the user must change
all of these properties.
Fortunately, [~arina] did a nice job with the UDF directories: they all are
computed from the base directory:
{noformat}
directory: {
// Base directory for remote and local udf directories, unique among clusters.
base: ${drill.exec.zk.root}"/udf",
// Path to local udf directory, always created on local file system.
// Root for these directory is generated at runtime unless Drill temporary
directory is set.
local: ${drill.exec.udf.directory.base}"/udf/local",
// Set this property if custom file system should be used to create remote
directories, ex: fs: "file:///".
// fs: "",
// Set this property if custom absolute root should be used for remote
directories, ex: root: "/app/drill".
// root: "",
// Relative path to all remote udf directories.
// Directories are created under default file system taken from Hadoop
configuration
// unless ${drill.exec.udf.directory.fs} is set.
// User home directory is used as root unless ${drill.exec.udf.directory.root}
is set.
staging: ${drill.exec.udf.directory.base}"/staging",
registry: ${drill.exec.udf.directory.base}"/registry",
tmp: ${drill.exec.udf.directory.base}"/tmp"
}
{noformat}
So, can we do the same thing for all the other local directories? Allow each to
be custom-set, but default them to be computed from a single base directory.
That way, if a unit test or install wants to move the Drill local directories
to, say, {{/var/drill/tmp}}, they only need change a single config line and
everything else follows automatically.
This can be done in the existing conf file as was done for UDFs. And, I guess
to preserve compatibility, we'd have to leave the properties where they are;
we'd just change their values.
This ticket asks to:
* Work out a good solution.
* Implement it in the config system
* Scrub the unit tests and {{DirTestWatcher}} to determine where we can
simplify code by reusing this solution rather than ad-hoc, per directory
configs.
* Modify {{DirTestWatcher}} to coordinate with the config system: Set the base
directory in config, then use the configured paths for each of the persistent
store, profile, UDF and other directories.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)