Tobias Bertelsen created SPARK-5590:
---------------------------------------

             Summary: Create a complete reference of configurable environment 
variables, config files and command-line parameters
                 Key: SPARK-5590
                 URL: https://issues.apache.org/jira/browse/SPARK-5590
             Project: Spark
          Issue Type: Wish
          Components: Spark Core
         Environment: All
            Reporter: Tobias Bertelsen


This originated as [a question on a 
stackoverflow|http://stackoverflow.com/q/28219279/]

It will be great with a complete reference of the different ways of configuring 
spark master and workers – especially different names of the same parameter and 
the precedence of different ways of configuring the same thing.

>From the original stackoverflow question:


h2. Known resources

 - [The standalone 
documentation|http://spark.apache.org/docs/1.2.0/spark-standalone.html] is the 
best I have found, but it does not clearly describes relationships between 
different variables/parameters nor which take precedence over other.
 - [The configuration 
documentation|http://spark.apache.org/docs/1.2.0/configuration.html] provides a 
good overview for application-properties, but not for the master/slave 
launch-time parameters.


h2. Example problem


The [standalone 
documentation|http://spark.apache.org/docs/1.2.0/spark-standalone.html] writes 
the following:

{quote}
 the following configuration options can be passed to the master and worker
 ...
 `-d DIR, --work-dir DIR`       Directory to use for scratch space and job 
output logs (default: SPARK_HOME/work); only on worker
{quote}

and later

{quote}
 `SPARK_LOCAL_DIRS` Directory to use for "scratch" space in Spark

 `SPARK_WORKER_DIR` Directory to run applications in, which will include both 
logs and scratch space (default: SPARK_HOME/work).
{quote}

As a spark-newbe I am a little confused by now. 

 - What is the relationship between `SPARK_LOCAL_DIRS`, `SPARK_WORKER_DIR`, and 
`-d`.  
 - What if I specify them all to different values – which takes precedence.
 - Does variables written in `$SPARK_HOME/conf/spark-env.sh` take precedence 
over variable defined in the shell/script starting spark?


h2. Ideal Solution


What I am looking for is esentially a single reference, that

 1. defines the precedence of different ways of specifying variables for spark 
and
 2. lists all variables/parameters.

For example something like this:

|| Varialble         || Cmd-line  || Default          || Description ||
 | SPARK_MASTER_PORT | -p --port | 8080             | Port for master to listen 
on |
 | SPARK_SLAVE_PORT  | -p --port | random           | Port for slave to listen 
on |
 | SPARK_WORKER_DIR  | -d --dir  | $SPARK_HOME/work | Used as default for 
worker data  |
 | SPARK_LOCAL_DIRS  |           | $SPARK_WORKER_DIR| Scratch space for RDD's |
 | ....              | ....      | ....             | .... |





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to