spark.html

ahyoungryu Mon, 19 Sep 2016 19:21:51 -0700

Author: ahyoungryu
Date: Tue Sep 20 02:21:35 2016
New Revision: 1761515

URL: http://svn.apache.org/viewvc?rev=1761515&view=rev
Log: (empty)


Modified:
    zeppelin/site/docs/0.7.0-SNAPSHOT/interpreter/spark.html

Modified: zeppelin/site/docs/0.7.0-SNAPSHOT/interpreter/spark.html
URL: 
http://svn.apache.org/viewvc/zeppelin/site/docs/0.7.0-SNAPSHOT/interpreter/spark.html?rev=1761515&r1=1761514&r2=1761515&view=diff
==============================================================================
--- zeppelin/site/docs/0.7.0-SNAPSHOT/interpreter/spark.html (original)
+++ zeppelin/site/docs/0.7.0-SNAPSHOT/interpreter/spark.html Tue Sep 20 
02:21:35 2016
@@ -4,7 +4,7 @@
   <head>
     <meta charset="utf-8">
     <title>Apache Zeppelin 0.7.0-SNAPSHOT Documentation: Apache Spark 
Interpreter for Apache Zeppelin</title>
-    <meta name="description" content="Apache Spark is a fast and 
general-purpose cluster computing system. It provides high-level APIs in Java, 
Scala, Python and R, and an optimized engine that supports general execution 
graphs.">
+    <meta name="description" content="Apache Spark is a fast and 
general-purpose cluster computing system. It provides high-level APIs in Java, 
Scala, Python and R, and an optimized engine that supports general execution 
engine.">
     <meta name="author" content="The Apache Software Foundation">
 
     <!-- Enable responsive viewport -->
@@ -206,9 +206,8 @@ limitations under the License.
 <h2>Overview</h2>
 
 <p><a href="http://spark.apache.org";>Apache Spark</a> is a fast and 
general-purpose cluster computing system.
-It provides high-level APIs in Java, Scala, Python and R, and an optimized 
engine that supports general execution graphs
-Apache Spark is supported in Zeppelin with
-Spark Interpreter group, which consists of five interpreters.</p>
+It provides high-level APIs in Java, Scala, Python and R, and an optimized 
engine that supports general execution graphs.
+Apache Spark is supported in Zeppelin with Spark interpreter group which 
consists of below five interpreters.</p>
 
 <table class="table-configuration">
   <tr>
@@ -219,25 +218,25 @@ Spark Interpreter group, which consists
   <tr>
     <td>%spark</td>
     <td>SparkInterpreter</td>
-    <td>Creates a SparkContext and provides a scala environment</td>
+    <td>Creates a SparkContext and provides a Scala environment</td>
   </tr>
   <tr>
-    <td>%pyspark</td>
+    <td>%spark.pyspark</td>
     <td>PySparkInterpreter</td>
-    <td>Provides a python environment</td>
+    <td>Provides a Python environment</td>
   </tr>
   <tr>
-    <td>%r</td>
+    <td>%spark.r</td>
     <td>SparkRInterpreter</td>
     <td>Provides an R environment with SparkR support</td>
   </tr>
   <tr>
-    <td>%sql</td>
+    <td>%spark.sql</td>
     <td>SparkSQLInterpreter</td>
     <td>Provides a SQL environment</td>
   </tr>
   <tr>
-    <td>%dep</td>
+    <td>%spark.dep</td>
     <td>DepInterpreter</td>
     <td>Dependency loader</td>
   </tr>
@@ -322,22 +321,22 @@ You can also set other Spark properties
 
 <h3>1. Export SPARK_HOME</h3>
 
-<p>In <strong>conf/zeppelin-env.sh</strong>, export <code>SPARK_HOME</code> 
environment variable with your Spark installation path.</p>
+<p>In <code>conf/zeppelin-env.sh</code>, export <code>SPARK_HOME</code> 
environment variable with your Spark installation path.</p>
 
-<p>for example</p>
+<p>For example,</p>
 <div class="highlight"><pre><code class="bash language-bash" 
data-lang="bash"><span class="nb">export </span><span 
class="nv">SPARK_HOME</span><span class="o">=</span>/usr/lib/spark
 </code></pre></div>
-<p>You can optionally export HADOOP_CONF_DIR and SPARK_SUBMIT_OPTIONS</p>
+<p>You can optionally export <code>HADOOP_CONF_DIR</code> and 
<code>SPARK_SUBMIT_OPTIONS</code></p>
 <div class="highlight"><pre><code class="bash language-bash" 
data-lang="bash"><span class="nb">export </span><span 
class="nv">HADOOP_CONF_DIR</span><span class="o">=</span>/usr/lib/hadoop
 <span class="nb">export </span><span 
class="nv">SPARK_SUBMIT_OPTIONS</span><span class="o">=</span><span 
class="s2">&quot;--packages com.databricks:spark-csv_2.10:1.2.0&quot;</span>
 </code></pre></div>
-<p>For Windows, ensure you have <code>winutils.exe</code> in 
<code>%HADOOP_HOME%\bin</code>. For more details please see <a 
href="https://wiki.apache.org/hadoop/WindowsProblems";>Problems running Hadoop 
on Windows</a></p>
+<p>For Windows, ensure you have <code>winutils.exe</code> in 
<code>%HADOOP_HOME%\bin</code>. Please see <a 
href="https://wiki.apache.org/hadoop/WindowsProblems";>Problems running Hadoop 
on Windows</a> for the details.</p>
 
 <h3>2. Set master in Interpreter menu</h3>
 
 <p>After start Zeppelin, go to <strong>Interpreter</strong> menu and edit 
<strong>master</strong> property in your Spark interpreter setting. The value 
may vary depending on your Spark cluster deployment type.</p>
 
-<p>for example,</p>
+<p>For example,</p>
 
 <ul>
 <li><strong>local[*]</strong> in local mode</li>
@@ -346,25 +345,27 @@ You can also set other Spark properties
 <li><strong>mesos://host:5050</strong> in Mesos cluster</li>
 </ul>
 
-<p>That&#39;s it. Zeppelin will work with any version of Spark and any 
deployment type without rebuilding Zeppelin in this way. (Zeppelin 
0.5.6-incubating release works up to Spark 1.6.1 )</p>
+<p>That&#39;s it. Zeppelin will work with any version of Spark and any 
deployment type without rebuilding Zeppelin in this way. 
+For the further information about Spark &amp; Zeppelin version compatibility, 
please refer to &quot;Available Interpreters&quot; section in <a 
href="https://zeppelin.apache.org/download.html";>Zeppelin download page</a>.</p>
 
 <blockquote>
 <p>Note that without exporting <code>SPARK_HOME</code>, it&#39;s running in 
local mode with included version of Spark. The included version may vary 
depending on the build profile.</p>
 </blockquote>
 
-<h2>SparkContext, SQLContext, ZeppelinContext</h2>
+<h2>SparkContext, SQLContext, SparkSession, ZeppelinContext</h2>
 
-<p>SparkContext, SQLContext, ZeppelinContext are automatically created and 
exposed as variable names &#39;sc&#39;, &#39;sqlContext&#39; and &#39;z&#39;, 
respectively, both in scala and python environments.</p>
+<p>SparkContext, SQLContext and ZeppelinContext are automatically created and 
exposed as variable names <code>sc</code>, <code>sqlContext</code> and 
<code>z</code>, respectively, in Scala, Python and R environments.
+Staring from 0.6.1 SparkSession is available as variable <code>spark</code> 
when you are using Spark 2.x.</p>
 
 <blockquote>
-<p>Note that scala / python environment shares the same SparkContext, 
SQLContext, ZeppelinContext instance.</p>
+<p>Note that Scala/Python/R environment shares the same SparkContext, 
SQLContext and ZeppelinContext instance.</p>
 </blockquote>
 
 <p><a name="dependencyloading"> </a></p>
 
 <h2>Dependency Management</h2>
 
-<p>There are two ways to load external library in spark interpreter. First is 
using Interpreter setting menu and second is loading Spark properties.</p>
+<p>There are two ways to load external libraries in Spark interpreter. First 
is using interpreter setting menu and second is loading Spark properties.</p>
 
 <h3>1. Setting Dependencies via Interpreter Setting</h3>
 
@@ -372,73 +373,66 @@ You can also set other Spark properties
 
 <h3>2. Loading Spark Properties</h3>
 
-<p>Once <code>SPARK_HOME</code> is set in <code>conf/zeppelin-env.sh</code>, 
Zeppelin uses <code>spark-submit</code> as spark interpreter runner. 
<code>spark-submit</code> supports two ways to load configurations. The first 
is command line options such as --master and Zeppelin can pass these options to 
<code>spark-submit</code> by exporting <code>SPARK_SUBMIT_OPTIONS</code> in 
conf/zeppelin-env.sh. Second is reading configuration options from 
<code>SPARK_HOME/conf/spark-defaults.conf</code>. Spark properites that user 
can set to distribute libraries are:</p>
+<p>Once <code>SPARK_HOME</code> is set in <code>conf/zeppelin-env.sh</code>, 
Zeppelin uses <code>spark-submit</code> as spark interpreter runner. 
<code>spark-submit</code> supports two ways to load configurations. 
+The first is command line options such as --master and Zeppelin can pass these 
options to <code>spark-submit</code> by exporting 
<code>SPARK_SUBMIT_OPTIONS</code> in <code>conf/zeppelin-env.sh</code>. Second 
is reading configuration options from 
<code>SPARK_HOME/conf/spark-defaults.conf</code>. Spark properties that user 
can set to distribute libraries are:</p>
 
 <table class="table-configuration">
   <tr>
     <th>spark-defaults.conf</th>
     <th>SPARK_SUBMIT_OPTIONS</th>
-    <th>Applicable Interpreter</th>
     <th>Description</th>
   </tr>
   <tr>
     <td>spark.jars</td>
     <td>--jars</td>
-    <td>%spark</td>
     <td>Comma-separated list of local jars to include on the driver and 
executor classpaths.</td>
   </tr>
   <tr>
     <td>spark.jars.packages</td>
     <td>--packages</td>
-    <td>%spark</td>
-    <td>Comma-separated list of maven coordinates of jars to include on the 
driver and executor classpaths. Will search the local maven repo, then maven 
central and any additional remote repositories given by --repositories. The 
format for the coordinates should be groupId:artifactId:version.</td>
+    <td>Comma-separated list of maven coordinates of jars to include on the 
driver and executor classpaths. Will search the local maven repo, then maven 
central and any additional remote repositories given by --repositories. The 
format for the coordinates should be 
<code>groupId:artifactId:version</code>.</td>
   </tr>
   <tr>
     <td>spark.files</td>
     <td>--files</td>
-    <td>%pyspark</td>
     <td>Comma-separated list of files to be placed in the working directory of 
each executor.</td>
   </tr>
 </table>
 
-<blockquote>
-<p>Note that adding jar to pyspark is only availabe via <code>%dep</code> 
interpreter at the moment.</p>
-</blockquote>
-
 <p>Here are few examples:</p>
 
 <ul>
-<li><p>SPARK_SUBMIT_OPTIONS in conf/zeppelin-env.sh</p>
-
-<p>export SPARK<em>SUBMIT</em>OPTIONS=&quot;--packages 
com.databricks:spark-csv_2.10:1.2.0 --jars /path/mylib1.jar,/path/mylib2.jar 
--files /path/mylib1.py,/path/mylib2.zip,/path/mylib3.egg&quot;</p></li>
-<li><p>SPARK_HOME/conf/spark-defaults.conf</p>
-
-<p>spark.jars        /path/mylib1.jar,/path/mylib2.jar
+<li><p><code>SPARK_SUBMIT_OPTIONS</code> in 
<code>conf/zeppelin-env.sh</code></p>
+<div class="highlight"><pre><code class="bash language-bash" 
data-lang="bash"><span class="nb">export </span><span 
class="nv">SPARK_SUBMIT_OPTIONS</span><span class="o">=</span><span 
class="s2">&quot;--packages com.databricks:spark-csv_2.10:1.2.0 --jars 
/path/mylib1.jar,/path/mylib2.jar --files 
/path/mylib1.py,/path/mylib2.zip,/path/mylib3.egg&quot;</span>
+</code></pre></div></li>
+<li><p><code>SPARK_HOME/conf/spark-defaults.conf</code></p>
+<div class="highlight"><pre><code class="text language-text" 
data-lang="text">spark.jars        /path/mylib1.jar,/path/mylib2.jar
 spark.jars.packages   com.databricks:spark-csv_2.10:1.2.0
-spark.files       /path/mylib1.py,/path/mylib2.egg,/path/mylib3.zip</p></li>
+spark.files       /path/mylib1.py,/path/mylib2.egg,/path/mylib3.zip
+</code></pre></div></li>
 </ul>
 
-<h3>3. Dynamic Dependency Loading via %dep interpreter</h3>
+<h3>3. Dynamic Dependency Loading via %spark.dep interpreter</h3>
 
 <blockquote>
-<p>Note: <code>%dep</code> interpreter is deprecated since v0.6.0.
-<code>%dep</code> interpreter load libraries to <code>%spark</code> and 
<code>%pyspark</code> but not to  <code>%spark.sql</code> interpreter so we 
recommend you to use first option instead.</p>
+<p>Note: <code>%spark.dep</code> interpreter is deprecated since v0.6.0.
+<code>%spark.dep</code> interpreter loads libraries to <code>%spark</code> and 
<code>%spark.pyspark</code> but not to  <code>%spark.sql</code> interpreter. So 
we recommend you to use the first option instead.</p>
 </blockquote>
 
-<p>When your code requires external library, instead of doing 
download/copy/restart Zeppelin, you can easily do following jobs using 
<code>%dep</code> interpreter.</p>
+<p>When your code requires external library, instead of doing 
download/copy/restart Zeppelin, you can easily do following jobs using 
<code>%spark.dep</code> interpreter.</p>
 
 <ul>
-<li>Load libraries recursively from Maven repository</li>
+<li>Load libraries recursively from maven repository</li>
 <li>Load libraries from local filesystem</li>
 <li>Add additional maven repository</li>
 <li>Automatically add libraries to SparkCluster (You can turn off)</li>
 </ul>
 
-<p>Dep interpreter leverages scala environment. So you can write any Scala 
code here.
-Note that <code>%dep</code> interpreter should be used before 
<code>%spark</code>, <code>%pyspark</code>, <code>%sql</code>.</p>
+<p>Dep interpreter leverages Scala environment. So you can write any Scala 
code here.
+Note that <code>%spark.dep</code> interpreter should be used before 
<code>%spark</code>, <code>%spark.pyspark</code>, <code>%spark.sql</code>.</p>
 
 <p>Here&#39;s usages.</p>
-<div class="highlight"><pre><code class="scala language-scala" 
data-lang="scala"><span class="o">%</span><span class="n">dep</span>
+<div class="highlight"><pre><code class="scala language-scala" 
data-lang="scala"><span class="o">%</span><span class="n">spark</span><span 
class="o">.</span><span class="n">dep</span>
 <span class="n">z</span><span class="o">.</span><span 
class="n">reset</span><span class="o">()</span> <span class="c1">// clean up 
previously added artifact and repository</span>
 
 <span class="c1">// add maven repository</span>
@@ -472,12 +466,12 @@ Note that <code>%dep</code> interpreter
 </code></pre></div>
 <h2>ZeppelinContext</h2>
 
-<p>Zeppelin automatically injects ZeppelinContext as variable &#39;z&#39; in 
your scala/python environment. ZeppelinContext provides some additional 
functions and utility.</p>
+<p>Zeppelin automatically injects <code>ZeppelinContext</code> as variable 
<code>z</code> in your Scala/Python environment. <code>ZeppelinContext</code> 
provides some additional functions and utilities.</p>
 
 <h3>Object Exchange</h3>
 
-<p>ZeppelinContext extends map and it&#39;s shared between scala, python 
environment.
-So you can put some object from scala and read it from python, vise versa.</p>
+<p><code>ZeppelinContext</code> extends map and it&#39;s shared between Scala 
and Python environment.
+So you can put some objects from Scala and read it from Python, vice versa.</p>
 
 <div class="codetabs">
   <div data-lang="scala" markdown="1">
@@ -495,7 +489,7 @@ So you can put some object from scala an
 
 
 <div class="highlight"><pre><code class="python"><span class="c"># Get object 
from python</span>
-<span class="o">%</span><span class="n">pyspark</span>
+<span class="o">%</span><span class="n">spark</span><span 
class="o">.</span><span class="n">pyspark</span>
 <span class="n">myObject</span> <span class="o">=</span> <span 
class="n">z</span><span class="o">.</span><span class="n">get</span><span 
class="p">(</span><span class="s">&quot;objName&quot;</span><span 
class="p">)</span>
 </code></pre></div>
 
@@ -505,8 +499,8 @@ So you can put some object from scala an
 
 <h3>Form Creation</h3>
 
-<p>ZeppelinContext provides functions for creating forms.
-In scala and python environments, you can create forms programmatically.
+<p><code>ZeppelinContext</code> provides functions for creating forms.
+In Scala and Python environments, you can create forms programmatically.
 <div class="codetabs">
   <div data-lang="scala" markdown="1"></p>
 
@@ -531,7 +525,7 @@ In scala and python environments, you ca
   <div data-lang="python" markdown="1">
 
 
-<div class="highlight"><pre><code class="python"><span class="o">%</span><span 
class="n">pyspark</span>
+<div class="highlight"><pre><code class="python"><span class="o">%</span><span 
class="n">spark</span><span class="o">.</span><span class="n">pyspark</span>
 <span class="c"># Create text input form</span>
 <span class="n">z</span><span class="o">.</span><span 
class="n">input</span><span class="p">(</span><span 
class="s">&quot;formName&quot;</span><span class="p">)</span>
 
@@ -552,14 +546,14 @@ In scala and python environments, you ca
 </div>
 
 <p>In sql environment, you can create form in simple template.</p>
-<div class="highlight"><pre><code class="text language-text" 
data-lang="text">%sql
-select * from ${table=defaultTableName} where text like &#39;%${search}%&#39;
+<div class="highlight"><pre><code class="sql language-sql" 
data-lang="sql"><span class="o">%</span><span class="n">spark</span><span 
class="p">.</span><span class="k">sql</span>
+<span class="k">select</span> <span class="o">*</span> <span 
class="k">from</span> <span class="err">${</span><span 
class="k">table</span><span class="o">=</span><span 
class="n">defaultTableName</span><span class="err">}</span> <span 
class="k">where</span> <span class="nb">text</span> <span class="k">like</span> 
<span class="s1">&#39;%${search}%&#39;</span>
 </code></pre></div>
 <p>To learn more about dynamic form, checkout <a 
href="../manual/dynamicform.html">Dynamic Form</a>.</p>
 
 <h2>Interpreter setting option</h2>
 
-<p>Interpreter setting can choose one of &#39;shared&#39;, &#39;scoped&#39;, 
&#39;isolated&#39; option. Spark interpreter creates separate scala compiler 
per each notebook but share a single SparkContext in &#39;scoped&#39; mode 
(experimental). It creates separate SparkContext per each notebook in 
&#39;isolated&#39; mode.</p>
+<p>You can choose one of <code>shared</code>, <code>scoped</code> and 
<code>isolated</code> options wheh you configure Spark interpreter. Spark 
interpreter creates separated Scala compiler per each notebook but share a 
single SparkContext in <code>scoped</code> mode (experimental). It creates 
separated SparkContext per each notebook in <code>isolated</code> mode.</p>
 
 <h2>Setting up Zeppelin with Kerberos</h2>
 
@@ -572,14 +566,14 @@ select * from ${table=defaultTableName}
 <ol>
 <li><p>On the server that Zeppelin is installed, install Kerberos client 
modules and configuration, krb5.conf.
 This is to make the server communicate with KDC.</p></li>
-<li><p>Set SPARK_HOME in <code>[ZEPPELIN\_HOME]/conf/zeppelin-env.sh</code> to 
use spark-submit
-(Additionally, you might have to set <code>export 
HADOOP\_CONF\_DIR=/etc/hadoop/conf</code>)</p></li>
-<li><p>Add the two properties below to spark configuration 
(<code>[SPARK_HOME]/conf/spark-defaults.conf</code>):</p>
+<li><p>Set <code>SPARK_HOME</code> in 
<code>[ZEPPELIN_HOME]/conf/zeppelin-env.sh</code> to use spark-submit
+(Additionally, you might have to set <code>export 
HADOOP_CONF_DIR=/etc/hadoop/conf</code>)</p></li>
+<li><p>Add the two properties below to Spark configuration 
(<code>[SPARK_HOME]/conf/spark-defaults.conf</code>):</p>
 <div class="highlight"><pre><code class="text language-text" 
data-lang="text">spark.yarn.principal
 spark.yarn.keytab
 </code></pre></div>
 <blockquote>
-<p><strong>NOTE:</strong> If you do not have access to the above 
spark-defaults.conf file, optionally, you may add the lines to the Spark 
Interpreter through the Interpreter tab in the Zeppelin UI.</p>
+<p><strong>NOTE:</strong> If you do not have permission to access for the 
above spark-defaults.conf file, optionally, you can add the above lines to the 
Spark Interpreter setting through the Interpreter tab in the Zeppelin UI.</p>
 </blockquote></li>
 <li><p>That&#39;s it. Play with Zeppelin!</p></li>
 </ol>

svn commit: r1761515 - /zeppelin/site/docs/0.7.0-SNAPSHOT/interpreter/spark.html

Reply via email to