Repository: incubator-zeppelin Updated Branches: refs/heads/master cc24227bf -> 893b49b5c
[ZEPPELIN-688] Giving an option to hide REPL output in Spark interpreter ### What is this PR for? When a user runs Spark interpreter, the result will come out with result number message like : ``` res0: Int = 250 ``` Someone might want to print this REPL output with their result, but others may want to see the result only since sometimes this output is too verbose. So, I just want to give an option to hide this REPL output to those users. The default value of `zeppelin.spark.printREPLOutput` is `true`. This status is as-is. Users can hide REPL output only when they changes this property `true` to `false`. In this case, they can check the result by specifying such as `print(some_variable)`. ### What type of PR is it? Improvement ### Todos * [x] - Add a property `zeppelin.spark.printREPLOutput` * [x] - Add Spark interpreter property table to `docs/spark.md` ### What is the Jira issue? [ZEPPELIN-688](https://issues.apache.org/jira/browse/ZEPPELIN-688#) ### How should this be tested? After applying this PR, 1. Create spark interpreter for this test and change `zeppelin.spark.printREPLOutput` property value `true` -> `false` 2. Create a notebook and bind interpreter what you made. 3. Write `val a = 250` down and run this paragraph. Then you can check the any output is not shown although paragraph status is **FINISHED** (This is the result of this PR). 4. Run `print(a)` in the next paragraph. Then finally you can get a result `250`. ### Screenshots (if appropriate)  ### Questions: * Does the licenses files need update? No * Is there breaking changes for older versions? No * Does this needs documentation? I added Spark interpreter property table to `docs/spark.md`. Author: AhyoungRyu <fbdkdu...@hanmail.net> Closes #764 from AhyoungRyu/ZEPPELIN-688 and squashes the following commits: c4bbe33 [AhyoungRyu] Add a additional sentence to docs/spark.md f1621f6 [AhyoungRyu] Add Spark interpreter property table to docs/spark.md 2036e09 [AhyoungRyu] ZEPPELIN-688: Giving an option to hide REPL output in spark interpreter Project: http://git-wip-us.apache.org/repos/asf/incubator-zeppelin/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-zeppelin/commit/893b49b5 Tree: http://git-wip-us.apache.org/repos/asf/incubator-zeppelin/tree/893b49b5 Diff: http://git-wip-us.apache.org/repos/asf/incubator-zeppelin/diff/893b49b5 Branch: refs/heads/master Commit: 893b49b5c065fc8da6c8f4d9ff0a79cfa177ba12 Parents: cc24227 Author: AhyoungRyu <fbdkdu...@hanmail.net> Authored: Wed Mar 9 10:56:09 2016 +0900 Committer: Lee moon soo <m...@apache.org> Committed: Thu Mar 10 13:21:20 2016 -0800 ---------------------------------------------------------------------- docs/interpreter/spark.md | 70 +++++++++++++++++++- .../apache/zeppelin/spark/SparkInterpreter.java | 63 ++++++++++-------- 2 files changed, 106 insertions(+), 27 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/incubator-zeppelin/blob/893b49b5/docs/interpreter/spark.md ---------------------------------------------------------------------- diff --git a/docs/interpreter/spark.md b/docs/interpreter/spark.md index 027d4b6..21c3df5 100644 --- a/docs/interpreter/spark.md +++ b/docs/interpreter/spark.md @@ -40,6 +40,74 @@ Spark Interpreter group, which consisted of 4 interpreters. </table> ## Configuration +Zeppelin provides the below properties for Spark interpreter. +You can also set other Spark properties which are not listed in the table. If so, please refer to [Spark Available Properties](http://spark.apache.org/docs/latest/configuration.html#available-properties). +<table class="table-configuration"> + <tr> + <th>Property</th> + <th>Default</th> + <th>Description</th> + </tr> + <tr> + <td>args</td> + <td></td> + <td>Spark commandline args</td> + </tr> + <td>master</td> + <td>local[*]</td> + <td>Spark master uri. <br/> ex) spark://masterhost:7077</td> + <tr> + <td>spark.app.name</td> + <td>Zeppelin</td> + <td>The name of spark application.</td> + </tr> + <tr> + <td>spark.cores.max</td> + <td></td> + <td>Total number of cores to use. <br/> Empty value uses all available core.</td> + </tr> + <tr> + <td>spark.executor.memory </td> + <td>512m</td> + <td>Executor memory per worker instance. <br/> ex) 512m, 32g</td> + </tr> + <tr> + <td>zeppelin.dep.additionalRemoteRepository</td> + <td>spark-packages, <br/> http://dl.bintray.com/spark-packages/maven, <br/> false;</td> + <td>A list of `id,remote-repository-URL,is-snapshot;` <br/> for each remote repository.</td> + </tr> + <tr> + <td>zeppelin.dep.localrepo</td> + <td>local-repo</td> + <td>Local repository for dependency loader</td> + </tr> + <tr> + <td>zeppelin.pyspark.python</td> + <td>python</td> + <td>Python command to run pyspark with</td> + </tr> + <tr> + <td>zeppelin.spark.concurrentSQL</td> + <td>false</td> + <td>Execute multiple SQL concurrently if set true.</td> + </tr> + <tr> + <td>zeppelin.spark.maxResult</td> + <td>1000</td> + <td>Max number of SparkSQL result to display.</td> + </tr> + <tr> + <td>zeppelin.spark.printREPLOutput</td> + <td>true</td> + <td>Print REPL output</td> + </tr> + <tr> + <td>zeppelin.spark.useHiveContext</td> + <td>true</td> + <td>Use HiveContext instead of SQLContext if it is true.</td> + </tr> +</table> + Without any configuration, Spark interpreter works out of box in local mode. But if you want to connect to your Spark cluster, you'll need to follow below two simple steps. ### 1. Export SPARK_HOME @@ -269,7 +337,7 @@ To learn more about dynamic form, checkout [Dynamic Form](../manual/dynamicform. In 'Separate Interpreter for each note' mode, SparkInterpreter creates scala compiler per each notebook. However it still shares the single SparkContext. ## Setting up Zeppelin with Kerberos -Logical setup with Zeppelin, Kerberos Distribution Center (KDC), and Spark on YARN: +Logical setup with Zeppelin, Kerberos Key Distribution Center (KDC), and Spark on YARN: <img src="../assets/themes/zeppelin/img/docs-img/kdc_zeppelin.png"> http://git-wip-us.apache.org/repos/asf/incubator-zeppelin/blob/893b49b5/spark/src/main/java/org/apache/zeppelin/spark/SparkInterpreter.java ---------------------------------------------------------------------- diff --git a/spark/src/main/java/org/apache/zeppelin/spark/SparkInterpreter.java b/spark/src/main/java/org/apache/zeppelin/spark/SparkInterpreter.java index 57d2724..c39ef31 100644 --- a/spark/src/main/java/org/apache/zeppelin/spark/SparkInterpreter.java +++ b/spark/src/main/java/org/apache/zeppelin/spark/SparkInterpreter.java @@ -82,31 +82,34 @@ public class SparkInterpreter extends Interpreter { static { Interpreter.register( - "spark", - "spark", - SparkInterpreter.class.getName(), - new InterpreterPropertyBuilder() - .add("spark.app.name", - getSystemDefault("SPARK_APP_NAME", "spark.app.name", "Zeppelin"), - "The name of spark application.") - .add("master", - getSystemDefault("MASTER", "spark.master", "local[*]"), - "Spark master uri. ex) spark://masterhost:7077") - .add("spark.executor.memory", - getSystemDefault(null, "spark.executor.memory", "512m"), - "Executor memory per worker instance. ex) 512m, 32g") - .add("spark.cores.max", - getSystemDefault(null, "spark.cores.max", ""), - "Total number of cores to use. Empty value uses all available core.") - .add("zeppelin.spark.useHiveContext", - getSystemDefault("ZEPPELIN_SPARK_USEHIVECONTEXT", - "zeppelin.spark.useHiveContext", "true"), - "Use HiveContext instead of SQLContext if it is true.") - .add("zeppelin.spark.maxResult", - getSystemDefault("ZEPPELIN_SPARK_MAXRESULT", "zeppelin.spark.maxResult", "1000"), - "Max number of SparkSQL result to display.") - .add("args", "", "spark commandline args").build()); - + "spark", + "spark", + SparkInterpreter.class.getName(), + new InterpreterPropertyBuilder() + .add("spark.app.name", + getSystemDefault("SPARK_APP_NAME", "spark.app.name", "Zeppelin"), + "The name of spark application.") + .add("master", + getSystemDefault("MASTER", "spark.master", "local[*]"), + "Spark master uri. ex) spark://masterhost:7077") + .add("spark.executor.memory", + getSystemDefault(null, "spark.executor.memory", "512m"), + "Executor memory per worker instance. ex) 512m, 32g") + .add("spark.cores.max", + getSystemDefault(null, "spark.cores.max", ""), + "Total number of cores to use. Empty value uses all available core.") + .add("zeppelin.spark.useHiveContext", + getSystemDefault("ZEPPELIN_SPARK_USEHIVECONTEXT", + "zeppelin.spark.useHiveContext", "true"), + "Use HiveContext instead of SQLContext if it is true.") + .add("zeppelin.spark.maxResult", + getSystemDefault("ZEPPELIN_SPARK_MAXRESULT", "zeppelin.spark.maxResult", "1000"), + "Max number of SparkSQL result to display.") + .add("args", "", "spark commandline args") + .add("zeppelin.spark.printREPLOutput", "true", + "Print REPL output") + .build() + ); } private ZeppelinContext z; @@ -383,6 +386,10 @@ public class SparkInterpreter extends Interpreter { return defaultValue; } + public boolean printREPLOutput() { + return java.lang.Boolean.parseBoolean(getProperty("zeppelin.spark.printREPLOutput")); + } + @Override public void open() { URL[] urls = getClassloaderUrls(); @@ -483,7 +490,11 @@ public class SparkInterpreter extends Interpreter { synchronized (sharedInterpreterLock) { /* create scala repl */ - this.interpreter = new SparkILoop(null, new PrintWriter(out)); + if (printREPLOutput()) { + this.interpreter = new SparkILoop(null, new PrintWriter(out)); + } else { + this.interpreter = new SparkILoop(null, new PrintWriter(Console.out(), false)); + } interpreter.settings_$eq(settings);