just to clarify, the configuration 'scratch' (remote tmp working directory) is a user-defined configuration coming out of SystemML-config.xml with internal default set to ./scratch_space if not specified and it is always accessed as dfs (which depending on your hadoop configuration might use different file system implementations, i.e., hdfs, gpfs, fs, etc).
>From my perspective, we should definitely keep the ability to specify a path for both local and remote tmp working directories because it really simplifies debugging. This is especially true if driver/client and executors/tasks run under different users (e.g., with LinuxTaskController, LinuxContainerExecutor, or Spark's yarn-client). Btw, these scenarios are indeed good use cases for absolute paths because a relative path (if not handled correctly) actually refers to different locations for driver/executors. I would be fine with renaming this configuration to something like 'remotetmpdir' (consistent with our 'localtmpdir') and automatically obtain temp working directories from hadoop if not specified. Regards, Matthias From: Mike Dusenberry <[email protected]> To: [email protected] Date: 03/31/2016 10:58 AM Subject: Remove "Scratch Space" In Favor Of Temp Folder Hi all, Currently, SystemML makes use of a "scratch space" folder for temporary files during execution. This is currently set to a relative "scratch_space" directory that will be placed relative to the execution path (local mode) or in the user's directory on HDFS. This works okay in some cases, although it can cause confusion as to why the folder exists. In other cases, such as on Databricks Cloud, a relative path for HDFS is not allowed, and thus the user must change this "scratch space" folder to an absolute path, or else a strange error message will occur. Since this "scratch space" folder is just for temporary files during execution, might it be better to simply query HDFS (which falls back to local FS if need) for a temporary folder, and just use that? If so, this would remove the need to adjust this setting, thus making it easier to use SystemML. Thoughts? - Mike -- Michael W. Dusenberry GitHub: github.com/dusenberrymw LinkedIn: linkedin.com/in/mikedusenberry
