Tobias Bertelsen created SPARK-5590:
---------------------------------------
Summary: Create a complete reference of configurable environment
variables, config files and command-line parameters
Key: SPARK-5590
URL: https://issues.apache.org/jira/browse/SPARK-5590
Project: Spark
Issue Type: Wish
Components: Spark Core
Environment: All
Reporter: Tobias Bertelsen
This originated as [a question on a
stackoverflow|http://stackoverflow.com/q/28219279/]
It will be great with a complete reference of the different ways of configuring
spark master and workers – especially different names of the same parameter and
the precedence of different ways of configuring the same thing.
>From the original stackoverflow question:
h2. Known resources
- [The standalone
documentation|http://spark.apache.org/docs/1.2.0/spark-standalone.html] is the
best I have found, but it does not clearly describes relationships between
different variables/parameters nor which take precedence over other.
- [The configuration
documentation|http://spark.apache.org/docs/1.2.0/configuration.html] provides a
good overview for application-properties, but not for the master/slave
launch-time parameters.
h2. Example problem
The [standalone
documentation|http://spark.apache.org/docs/1.2.0/spark-standalone.html] writes
the following:
{quote}
the following configuration options can be passed to the master and worker
...
`-d DIR, --work-dir DIR` Directory to use for scratch space and job
output logs (default: SPARK_HOME/work); only on worker
{quote}
and later
{quote}
`SPARK_LOCAL_DIRS` Directory to use for "scratch" space in Spark
`SPARK_WORKER_DIR` Directory to run applications in, which will include both
logs and scratch space (default: SPARK_HOME/work).
{quote}
As a spark-newbe I am a little confused by now.
- What is the relationship between `SPARK_LOCAL_DIRS`, `SPARK_WORKER_DIR`, and
`-d`.
- What if I specify them all to different values – which takes precedence.
- Does variables written in `$SPARK_HOME/conf/spark-env.sh` take precedence
over variable defined in the shell/script starting spark?
h2. Ideal Solution
What I am looking for is esentially a single reference, that
1. defines the precedence of different ways of specifying variables for spark
and
2. lists all variables/parameters.
For example something like this:
|| Varialble || Cmd-line || Default || Description ||
| SPARK_MASTER_PORT | -p --port | 8080 | Port for master to listen
on |
| SPARK_SLAVE_PORT | -p --port | random | Port for slave to listen
on |
| SPARK_WORKER_DIR | -d --dir | $SPARK_HOME/work | Used as default for
worker data |
| SPARK_LOCAL_DIRS | | $SPARK_WORKER_DIR| Scratch space for RDD's |
| .... | .... | .... | .... |
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]