This is an automated email from the ASF dual-hosted git repository.

zjffdu pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/zeppelin.git


The following commit(s) were added to refs/heads/master by this push:
     new f3bdd4a  [ZEPPELIN-5241] Typos in spark tutorial
f3bdd4a is described below

commit f3bdd4a1fa0cf19bc1015955d8ade4bc79a8e16f
Author: OmriK <[email protected]>
AuthorDate: Sun Feb 7 18:12:06 2021 +0200

    [ZEPPELIN-5241] Typos in spark tutorial
    
    ### What is this PR for?
    Fixing some typos from the tutorials notebook
    
    ### What type of PR is it?
    Documentation
    
    ### Todos
    * [x] - Task
    
    ### What is the Jira issue?
    [ZEPPELIN-5241](https://issues.apache.org/jira/browse/ZEPPELIN-5241)
    
    ### How should this be tested?
    * Standard CI tests
    ### Screenshots (if appropriate)
    
    ### Questions:
    * Does the licenses files need update? - no
    * Is there breaking changes for older versions? - no
    * Does this needs documentation? - no
    
    Author: OmriK <[email protected]>
    
    Closes #4048 from omrisk/typos_in_spark_tutorial and squashes the following 
commits:
    
    d85861463 [OmriK] Checked part 1
---
 .... Spark Interpreter Introduction_2F8KN6TKK.zpln | 26 +++++++++++-----------
 .../3. Spark SQL (PySpark)_2EWM84JXA.zpln          | 10 ++++-----
 .../3. Spark SQL (Scala)_2EYUV26VR.zpln            | 14 ++++++------
 .../Spark Tutorial/4. Spark MlLib_2EZFM3GJA.zpln   |  2 +-
 4 files changed, 26 insertions(+), 26 deletions(-)

diff --git a/notebook/Spark Tutorial/1. Spark Interpreter 
Introduction_2F8KN6TKK.zpln b/notebook/Spark Tutorial/1. Spark Interpreter 
Introduction_2F8KN6TKK.zpln
index d085d9f..0f3cb7a 100644
--- a/notebook/Spark Tutorial/1. Spark Interpreter Introduction_2F8KN6TKK.zpln  
+++ b/notebook/Spark Tutorial/1. Spark Interpreter Introduction_2F8KN6TKK.zpln  
@@ -2,7 +2,7 @@
   "paragraphs": [
     {
       "title": "",
-      "text": "%md\n\n# Introduction\n\nThis tutorial is for how to use Spark 
Interpreter in Zeppelin.\n\n1. Specify `SPARK_HOME` in interpreter setting. If 
you don\u0027t specify `SPARK_HOME`, Zeppelin will use the embedded spark which 
can only run in local mode. And some advanced features may not work in this 
embedded spark.\n2. Specify `spark.master` for spark execution mode.\n    * 
`local[*]`  - Driver and Executor would both run in the same host of zeppelin 
server. It is only for te [...]
+      "text": "%md\n\n# Introduction\n\nThis tutorial is for how to use Spark 
Interpreter in Zeppelin.\n\n1. Specify `SPARK_HOME` in interpreter setting. If 
you don\u0027t specify `SPARK_HOME`, Zeppelin will use the embedded spark which 
can only run in local mode. And some advanced features may not work in this 
embedded spark.\n2. Specify `spark.master` for spark execution mode.\n    * 
`local[*]`  - Driver and Executor would both run in the same host of zeppelin 
server. It is only for te [...]
       "user": "anonymous",
       "dateUpdated": "2020-05-04 13:44:39.482",
       "config": {
@@ -29,7 +29,7 @@
         "msg": [
           {
             "type": "HTML",
-            "data": "\u003cdiv 
class\u003d\"markdown-body\"\u003e\n\u003ch1\u003eIntroduction\u003c/h1\u003e\n\u003cp\u003eThis
 tutorial is for how to use Spark Interpreter in 
Zeppelin.\u003c/p\u003e\n\u003col\u003e\n\u003cli\u003eSpecify 
\u003ccode\u003eSPARK_HOME\u003c/code\u003e in interpreter setting. If you 
don\u0026rsquo;t specify \u003ccode\u003eSPARK_HOME\u003c/code\u003e, Zeppelin 
will use the embedded spark which can only run in local mode. And some advanced 
features may not wo [...]
+            "data": "\u003cdiv 
class\u003d\"markdown-body\"\u003e\n\u003ch1\u003eIntroduction\u003c/h1\u003e\n\u003cp\u003eThis
 tutorial is for how to use Spark Interpreter in 
Zeppelin.\u003c/p\u003e\n\u003col\u003e\n\u003cli\u003eSpecify 
\u003ccode\u003eSPARK_HOME\u003c/code\u003e in interpreter setting. If you 
don\u0026rsquo;t specify \u003ccode\u003eSPARK_HOME\u003c/code\u003e, Zeppelin 
will use the embedded spark which can only run in local mode. And some advanced 
features may not wo [...]
           }
         ]
       },
@@ -44,7 +44,7 @@
     },
     {
       "title": "Use Generic Inline Configuration instead of Interpreter 
Setting",
-      "text": "%md\n\nCustomize your spark interpreter is indispensible for 
Zeppelin Notebook. E.g. You want to add third party jars, change the execution 
mode, change the number of exceutor or its memory and etc. You can check this 
link for all the available [spark 
configuration](http://spark.apache.org/docs/latest/configuration.html)\nAlthough
 you can customize these in interpreter setting, it is recommended to do via 
the generic inline configuration. Because interpreter setting is sha [...]
+      "text": "%md\n\nCustomize your spark interpreter is indispensable for 
Zeppelin Notebook. E.g. You want to add third party jars, change the execution 
mode, change the number of executor or its memory and etc. You can check this 
link for all the available [spark 
configuration](http://spark.apache.org/docs/latest/configuration.html)\nAlthough
 you can customize these in interpreter setting, it is recommended to do via 
the generic inline configuration. Because interpreter setting is sha [...]
       "user": "anonymous",
       "dateUpdated": "2020-05-04 13:45:44.204",
       "config": {
@@ -72,7 +72,7 @@
         "msg": [
           {
             "type": "HTML",
-            "data": "\u003cdiv 
class\u003d\"markdown-body\"\u003e\n\u003cp\u003eCustomize your spark 
interpreter is indispensible for Zeppelin Notebook. E.g. You want to add third 
party jars, change the execution mode, change the number of exceutor or its 
memory and etc. You can check this link for all the available \u003ca 
href\u003d\"http://spark.apache.org/docs/latest/configuration.html\"\u003espark 
configuration\u003c/a\u003e\u003cbr /\u003e\nAlthough you can customize these 
in inter [...]
+            "data": "\u003cdiv 
class\u003d\"markdown-body\"\u003e\n\u003cp\u003eCustomize your spark 
interpreter is indispensable for Zeppelin Notebook. E.g. You want to add third 
party jars, change the execution mode, change the number of executor or its 
memory and etc. You can check this link for all the available \u003ca 
href\u003d\"http://spark.apache.org/docs/latest/configuration.html\"\u003espark 
configuration\u003c/a\u003e\u003cbr /\u003e\nAlthough you can customize these 
in inter [...]
           }
         ]
       },
@@ -87,7 +87,7 @@
     },
     {
       "title": "Generic Inline Configuration",
-      "text": "%spark.conf\n\nSPARK_HOME  \u003cPATH_TO_SPAKR_HOME\u003e\n\n# 
set driver memrory to 8g\nspark.driver.memory 8g\n\n# set executor number to be 
6\nspark.executor.instances  6\n\n# set executor memrory 
4g\nspark.executor.memory  4g\n\n# Any other spark properties can be set here. 
Here\u0027s avaliable spark configruation you can set. 
(http://spark.apache.org/docs/latest/configuration.html)\n",
+      "text": "%spark.conf\n\nSPARK_HOME  \u003cPATH_TO_SPARK_HOME\u003e\n\n# 
set driver memory to 8g\nspark.driver.memory 8g\n\n# set executor number to be 
6\nspark.executor.instances  6\n\n# set executor memory 
4g\nspark.executor.memory  4g\n\n# Any other spark properties can be set here. 
Here\u0027s avaliable spark configruation you can set. 
(http://spark.apache.org/docs/latest/configuration.html)\n",
       "user": "anonymous",
       "dateUpdated": "2020-04-30 10:56:30.840",
       "config": {
@@ -145,7 +145,7 @@
         "msg": [
           {
             "type": "HTML",
-            "data": "\u003cdiv 
class\u003d\"markdown-body\"\u003e\n\u003cp\u003eThere\u0026rsquo;re 2 ways to 
add third party 
libraries.\u003c/p\u003e\n\u003cul\u003e\n\u003cli\u003e\u003ccode\u003eGeneric 
Inline Configuration\u003c/code\u003e   It is the recommended way to add third 
party jars/packages. Use \u003ccode\u003espark.jars\u003c/code\u003e for adding 
local jar file and \u003ccode\u003espark.jars.packages\u003c/code\u003e for 
adding packages\u003c/li\u003e\n\u003cli\u003e\u003 [...]
+            "data": "\u003cdiv 
class\u003d\"markdown-body\"\u003e\n\u003cp\u003eThere\u0026rsquo;re 2 ways to 
add third party 
libraries.\u003c/p\u003e\n\u003cul\u003e\n\u003cli\u003e\u003ccode\u003eGeneric 
Inline Configuration\u003c/code\u003e   It is the recommended way to add third 
party jars/packages. Use \u003ccode\u003espark.jars\u003c/code\u003e for adding 
local jar file and \u003ccode\u003espark.jars.packages\u003c/code\u003e for 
adding packages\u003c/li\u003e\n\u003cli\u003e\u003 [...]
           }
         ]
       },
@@ -160,7 +160,7 @@
     },
     {
       "title": "",
-      "text": "%spark.conf\n\n# Must set SPARK_HOME for this example, because 
it won\u0027t work for Zeppelin\u0027s embedded spark mode. The embedded spark 
mode doesn\u0027t \n# use spark-submit to launch spark interpreter, so 
spark.jars and spark.jars.packages won\u0027t take affect. \nSPARK_HOME 
\u003cPATH_TO_SPAKR_HOME\u003e\n\n# set execution mode\nmaster yarn-client\n\n# 
spark.jars can be used for adding any local jar files into spark interpreter\n# 
spark.jars  \u003cpath_to_local_ [...]
+      "text": "%spark.conf\n\n# Must set SPARK_HOME for this example, because 
it won\u0027t work for Zeppelin\u0027s embedded spark mode. The embedded spark 
mode doesn\u0027t \n# use spark-submit to launch spark interpreter, so 
spark.jars and spark.jars.packages won\u0027t take affect. \nSPARK_HOME 
\u003cPATH_TO_SPARK_HOME\u003e\n\n# set execution mode\nmaster yarn-client\n\n# 
spark.jars can be used for adding any local jar files into spark interpreter\n# 
spark.jars  \u003cpath_to_local_ [...]
       "user": "anonymous",
       "dateUpdated": "2020-04-30 11:01:36.681",
       "config": {
@@ -272,7 +272,7 @@
     },
     {
       "title": "Code Completion in Scala",
-      "text": "%md\n\nSpark interpreter provide code completion feature. As 
long as you type `tab`, code completion will start to work and provide you with 
a list of candiates. Here\u0027s one screenshot of how it works. \n\n**To be 
noticed**, code completion only works after spark interpreter is launched. So 
it will not work when you type code in the first paragraph as the spark 
interpreter is not launched yet. For me, usually I will run one simple code 
such as `sc.version` to launch sp [...]
+      "text": "%md\n\nSpark interpreter provide code completion feature. As 
long as you type `tab`, code completion will start to work and provide you with 
a list of candidates. Here\u0027s one screenshot of how it works. \n\n**To be 
noticed**, code completion only works after spark interpreter is launched. So 
it will not work when you type code in the first paragraph as the spark 
interpreter is not launched yet. For me, usually I will run one simple code 
such as `sc.version` to launch s [...]
       "user": "anonymous",
       "dateUpdated": "2020-04-30 11:03:03.127",
       "config": {
@@ -300,7 +300,7 @@
         "msg": [
           {
             "type": "HTML",
-            "data": "\u003cdiv 
class\u003d\"markdown-body\"\u003e\n\u003cp\u003eSpark interpreter provide code 
completion feature. As long as you type \u003ccode\u003etab\u003c/code\u003e, 
code completion will start to work and provide you with a list of candiates. 
Here\u0026rsquo;s one screenshot of how it 
works.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eTo be 
noticed\u003c/strong\u003e, code completion only works after spark interpreter 
is launched. So it will not work when you typ [...]
+            "data": "\u003cdiv 
class\u003d\"markdown-body\"\u003e\n\u003cp\u003eSpark interpreter provide code 
completion feature. As long as you type \u003ccode\u003etab\u003c/code\u003e, 
code completion will start to work and provide you with a list of candidates. 
Here\u0026rsquo;s one screenshot of how it 
works.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eTo be 
noticed\u003c/strong\u003e, code completion only works after spark interpreter 
is launched. So it will not work when you ty [...]
           }
         ]
       },
@@ -315,7 +315,7 @@
     },
     {
       "title": "PySpark",
-      "text": "%md\n\nFor using PySpark, you need to do some other pyspark 
configration besides the above spark configuration we mentioned before. The 
most important property you need to set is python path for both driver and 
executor. If you hit the following error, it means your python on driver is 
mismatched with that of executor. In this case you need to check the 2 
properties: `PYSPARK_PYTHON` and `PYSPARK_DRIVER_PYTHON`. (You can use 
`spark.pyspark.python` and `spark.pyspark.driver [...]
+      "text": "%md\n\nFor using PySpark, you need to do some other pyspark 
configuration besides the above spark configuration we mentioned before. The 
most important property you need to set is python path for both driver and 
executor. If you hit the following error, it means your python on driver is 
mismatched with that of executor. In this case you need to check the 2 
properties: `PYSPARK_PYTHON` and `PYSPARK_DRIVER_PYTHON`. (You can use 
`spark.pyspark.python` and `spark.pyspark.drive [...]
       "user": "anonymous",
       "dateUpdated": "2020-04-30 11:04:18.086",
       "config": {
@@ -343,7 +343,7 @@
         "msg": [
           {
             "type": "HTML",
-            "data": "\u003cdiv 
class\u003d\"markdown-body\"\u003e\n\u003cp\u003eFor using PySpark, you need to 
do some other pyspark configration besides the above spark configuration we 
mentioned before. The most important property you need to set is python path 
for both driver and executor. If you hit the following error, it means your 
python on driver is mismatched with that of executor. In this case you need to 
check the 2 properties: \u003ccode\u003ePYSPARK_PYTHON\u003c/code\u003e a [...]
+            "data": "\u003cdiv 
class\u003d\"markdown-body\"\u003e\n\u003cp\u003eFor using PySpark, you need to 
do some other pyspark configuration besides the above spark configuration we 
mentioned before. The most important property you need to set is python path 
for both driver and executor. If you hit the following error, it means your 
python on driver is mismatched with that of executor. In this case you need to 
check the 2 properties: \u003ccode\u003ePYSPARK_PYTHON\u003c/code\u003e  [...]
           }
         ]
       },
@@ -392,7 +392,7 @@
     },
     {
       "title": "Use IPython",
-      "text": "%md\n\nStarting from Zeppelin 0.8.0, `ipython` is integrated 
into Zeppelin. And `PySparkInterpreter`(`%spark.pyspark`) would use `ipython` 
if it is avalible. It is recommended to use `ipython` interpreter as it 
provides more powerful feature than the old PythonInterpreter. Spark create a 
new interpreter called `IPySparkInterpreter` (`%spark.ipyspark`) which use 
IPython underneath. You can use all the `ipython` features in this 
IPySparkInterpreter. There\u0027s one ipython  [...]
+      "text": "%md\n\nStarting from Zeppelin 0.8.0, `ipython` is integrated 
into Zeppelin. And `PySparkInterpreter`(`%spark.pyspark`) would use `ipython` 
if it is available. It is recommended to use `ipython` interpreter as it 
provides more powerful feature than the old PythonInterpreter. Spark create a 
new interpreter called `IPySparkInterpreter` (`%spark.ipyspark`) which use 
IPython underneath. You can use all the `ipython` features in this 
IPySparkInterpreter. There\u0027s one ipython [...]
       "user": "anonymous",
       "dateUpdated": "2020-04-30 11:10:07.426",
       "config": {
@@ -420,7 +420,7 @@
         "msg": [
           {
             "type": "HTML",
-            "data": "\u003cdiv 
class\u003d\"markdown-body\"\u003e\n\u003cp\u003eStarting from Zeppelin 0.8.0, 
\u003ccode\u003eipython\u003c/code\u003e is integrated into Zeppelin. And 
\u003ccode\u003ePySparkInterpreter\u003c/code\u003e(\u003ccode\u003e%spark.pyspark\u003c/code\u003e)
 would use \u003ccode\u003eipython\u003c/code\u003e if it is avalible. It is 
recommended to use \u003ccode\u003eipython\u003c/code\u003e interpreter as it 
provides more powerful feature than the old PythonInt [...]
+            "data": "\u003cdiv 
class\u003d\"markdown-body\"\u003e\n\u003cp\u003eStarting from Zeppelin 0.8.0, 
\u003ccode\u003eipython\u003c/code\u003e is integrated into Zeppelin. And 
\u003ccode\u003ePySparkInterpreter\u003c/code\u003e(\u003ccode\u003e%spark.pyspark\u003c/code\u003e)
 would use \u003ccode\u003eipython\u003c/code\u003e if it is available. It is 
recommended to use \u003ccode\u003eipython\u003c/code\u003e interpreter as it 
provides more powerful feature than the old PythonIn [...]
           }
         ]
       },
diff --git a/notebook/Spark Tutorial/3. Spark SQL (PySpark)_2EWM84JXA.zpln 
b/notebook/Spark Tutorial/3. Spark SQL (PySpark)_2EWM84JXA.zpln
index 53c5ca3..7802e98 100644
--- a/notebook/Spark Tutorial/3. Spark SQL (PySpark)_2EWM84JXA.zpln     
+++ b/notebook/Spark Tutorial/3. Spark SQL (PySpark)_2EWM84JXA.zpln     
@@ -2,7 +2,7 @@
   "paragraphs": [
     {
       "title": "Introduction",
-      "text": "%md\n\nThis is a tutorial for Spark SQL in PySpark (based on 
Spark 2.x).  First we need to clarifiy serveral concetps of Spark SQL\n\n* 
**SparkSession**   - This is the entry point of Spark SQL, you need use 
`SparkSession` to create DataFrame/Dataset, register UDF, query table and 
etc.\n* **DataFrame**      - There\u0027s no Dataset in PySpark, but only 
DataFrame. The DataFrame of PySpark is very similar with DataFrame concept of 
Pandas, but is distributed. \n",
+      "text": "%md\n\nThis is a tutorial for Spark SQL in PySpark (based on 
Spark 2.x).  First we need to clarify several concepts of Spark SQL\n\n* 
**SparkSession**   - This is the entry point of Spark SQL, you need use 
`SparkSession` to create DataFrame/Dataset, register UDF, query table and 
etc.\n* **DataFrame**      - There\u0027s no Dataset in PySpark, but only 
DataFrame. The DataFrame of PySpark is very similar with DataFrame concept of 
Pandas, but is distributed. \n",
       "user": "anonymous",
       "dateUpdated": "2020-03-11 11:16:37.393",
       "config": {
@@ -32,7 +32,7 @@
         "msg": [
           {
             "type": "HTML",
-            "data": "\u003cdiv 
class\u003d\"markdown-body\"\u003e\n\u003cp\u003eThis is a tutorial for Spark 
SQL in PySpark (based on Spark 2.x).  First we need to clarifiy serveral 
concetps of Spark 
SQL\u003c/p\u003e\n\u003cul\u003e\n\u003cli\u003e\u003cstrong\u003eSparkSession\u003c/strong\u003e
   - This is the entry point of Spark SQL, you need use 
\u003ccode\u003eSparkSession\u003c/code\u003e to create DataFrame/Dataset, 
register UDF, query table and etc.\u003c/li\u003e\n\u003cli\u00 [...]
+            "data": "\u003cdiv 
class\u003d\"markdown-body\"\u003e\n\u003cp\u003eThis is a tutorial for Spark 
SQL in PySpark (based on Spark 2.x).  First we need to clarify several concepts 
of Spark 
SQL\u003c/p\u003e\n\u003cul\u003e\n\u003cli\u003e\u003cstrong\u003eSparkSession\u003c/strong\u003e
   - This is the entry point of Spark SQL, you need use 
\u003ccode\u003eSparkSession\u003c/code\u003e to create DataFrame/Dataset, 
register UDF, query table and etc.\u003c/li\u003e\n\u003cli\u003e [...]
           }
         ]
       },
@@ -137,7 +137,7 @@
     },
     {
       "title": "Spark Configuration",
-      "text": "%spark.conf\n\n# It is strongly recommended to set SPARK_HOME 
explictly instead of using the embedded spark of Zeppelin. As the function of 
embedded spark of Zeppelin is limited and can only run in local mode.\n# 
SPARK_HOME \u003cyour_spark_dist_path\u003e\n\n# Uncomment the following line 
if you want to use yarn-cluster mode (It is recommended to use yarn-cluster 
mode from Zeppelin 0.8, as the driver will run on the remote host of yarn 
cluster which can mitigate memory pr [...]
+      "text": "%spark.conf\n\n# It is strongly recommended to set SPARK_HOME 
explicitly instead of using the embedded spark of Zeppelin. As the function of 
embedded spark of Zeppelin is limited and can only run in local mode.\n# 
SPARK_HOME \u003cyour_spark_dist_path\u003e\n\n# Uncomment the following line 
if you want to use yarn-cluster mode (It is recommended to use yarn-cluster 
mode from Zeppelin 0.8, as the driver will run on the remote host of yarn 
cluster which can mitigate memory p [...]
       "user": "anonymous",
       "dateUpdated": "2020-03-11 13:26:01.470",
       "config": {
@@ -713,7 +713,7 @@
     },
     {
       "title": "Visualize DataFrame/Dataset",
-      "text": "%md\n\nThere\u0027s 2 approaches to visuliaze DataFrame/Dataset 
in Zeppelin\n\n* Use SparkSQLInterpreter via `%spark.sql`\n* Use 
ZeppelinContext via `z.show`\n\n",
+      "text": "%md\n\nThere\u0027s 2 approaches to visualize DataFrame/Dataset 
in Zeppelin\n\n* Use SparkSQLInterpreter via `%spark.sql`\n* Use 
ZeppelinContext via `z.show`\n\n",
       "user": "anonymous",
       "dateUpdated": "2020-01-21 15:47:18.301",
       "config": {
@@ -743,7 +743,7 @@
         "msg": [
           {
             "type": "HTML",
-            "data": "\u003cdiv 
class\u003d\"markdown-body\"\u003e\n\u003cp\u003eThere\u0026rsquo;s 2 
approaches to visuliaze DataFrame/Dataset in 
Zeppelin\u003c/p\u003e\n\u003cul\u003e\n\u003cli\u003eUse SparkSQLInterpreter 
via 
\u003ccode\u003e%spark.sql\u003c/code\u003e\u003c/li\u003e\n\u003cli\u003eUse 
ZeppelinContext via 
\u003ccode\u003ez.show\u003c/code\u003e\u003c/li\u003e\n\u003c/ul\u003e\n\n\u003c/div\u003e"
+            "data": "\u003cdiv 
class\u003d\"markdown-body\"\u003e\n\u003cp\u003eThere\u0026rsquo;s 2 
approaches to visualize DataFrame/Dataset in 
Zeppelin\u003c/p\u003e\n\u003cul\u003e\n\u003cli\u003eUse SparkSQLInterpreter 
via 
\u003ccode\u003e%spark.sql\u003c/code\u003e\u003c/li\u003e\n\u003cli\u003eUse 
ZeppelinContext via 
\u003ccode\u003ez.show\u003c/code\u003e\u003c/li\u003e\n\u003c/ul\u003e\n\n\u003c/div\u003e"
           }
         ]
       },
diff --git a/notebook/Spark Tutorial/3. Spark SQL (Scala)_2EYUV26VR.zpln 
b/notebook/Spark Tutorial/3. Spark SQL (Scala)_2EYUV26VR.zpln
index 7ad5809..da4002f 100644
--- a/notebook/Spark Tutorial/3. Spark SQL (Scala)_2EYUV26VR.zpln       
+++ b/notebook/Spark Tutorial/3. Spark SQL (Scala)_2EYUV26VR.zpln       
@@ -2,7 +2,7 @@
   "paragraphs": [
     {
       "title": "Introduction",
-      "text": "%md\n\nThis is a tutorial for Spark SQL in scala (based on 
Spark 2.x).  First we need to clarifiy serveral basic concepts of Spark 
SQL\n\n* **SparkSession**   - This is the entry point of Spark SQL, you need 
use `SparkSession` to create DataFrame/Dataset, register UDF, query table and 
etc.\n* **Dataset**        - Dataset is the core abstraction of Spark SQL. 
Underneath Dataset is RDD, but Dataset know more about your data, specifically 
its structure, so that Dataset could  [...]
+      "text": "%md\n\nThis is a tutorial for Spark SQL in scala (based on 
Spark 2.x).  First we need to clarify several basic concepts of Spark SQL\n\n* 
**SparkSession**   - This is the entry point of Spark SQL, you need use 
`SparkSession` to create DataFrame/Dataset, register UDF, query table and 
etc.\n* **Dataset**        - Dataset is the core abstraction of Spark SQL. 
Underneath Dataset is RDD, but Dataset know more about your data, specifically 
its structure, so that Dataset could do [...]
       "user": "anonymous",
       "dateUpdated": "2020-03-11 13:26:59.236",
       "config": {
@@ -32,7 +32,7 @@
         "msg": [
           {
             "type": "HTML",
-            "data": "\u003cdiv 
class\u003d\"markdown-body\"\u003e\n\u003cp\u003eThis is a tutorial for Spark 
SQL in scala (based on Spark 2.x).  First we need to clarifiy serveral basic 
concepts of Spark 
SQL\u003c/p\u003e\n\u003cul\u003e\n\u003cli\u003e\u003cstrong\u003eSparkSession\u003c/strong\u003e
   - This is the entry point of Spark SQL, you need use 
\u003ccode\u003eSparkSession\u003c/code\u003e to create DataFrame/Dataset, 
register UDF, query table and etc.\u003c/li\u003e\n\u003cli [...]
+            "data": "\u003cdiv 
class\u003d\"markdown-body\"\u003e\n\u003cp\u003eThis is a tutorial for Spark 
SQL in scala (based on Spark 2.x).  First we need to clarify several basic 
concepts of Spark 
SQL\u003c/p\u003e\n\u003cul\u003e\n\u003cli\u003e\u003cstrong\u003eSparkSession\u003c/strong\u003e
   - This is the entry point of Spark SQL, you need use 
\u003ccode\u003eSparkSession\u003c/code\u003e to create DataFrame/Dataset, 
register UDF, query table and etc.\u003c/li\u003e\n\u003cli\u [...]
           }
         ]
       },
@@ -140,7 +140,7 @@
     },
     {
       "title": "",
-      "text": "%spark.conf\n\n# It is strongly recommended to set SPARK_HOME 
explictly instead of using the embedded spark of Zeppelin. As the function of 
embedded spark of Zeppelin is limited and can only run in local mode.\n# 
SPARK_HOME \u003cyour_spark_dist_path\u003e\n\n# Uncomment the following line 
if you want to use yarn-cluster mode (It is recommended to use yarn-cluster 
mode after Zeppelin 0.8, as the driver will run on the remote host of yarn 
cluster which can mitigate memory p [...]
+      "text": "%spark.conf\n\n# It is strongly recommended to set SPARK_HOME 
explicitly instead of using the embedded spark of Zeppelin. As the function of 
embedded spark of Zeppelin is limited and can only run in local mode.\n# 
SPARK_HOME \u003cyour_spark_dist_path\u003e\n\n# Uncomment the following line 
if you want to use yarn-cluster mode (It is recommended to use yarn-cluster 
mode after Zeppelin 0.8, as the driver will run on the remote host of yarn 
cluster which can mitigate memory  [...]
       "user": "anonymous",
       "dateUpdated": "2020-03-11 13:28:13.784",
       "config": {
@@ -556,7 +556,7 @@
     },
     {
       "title": "Join on Single Field",
-      "text": "%spark\n\nval df1 \u003d spark.createDataFrame(Seq((1, 
\"andy\", 20, 1), (2, \"jeff\", 23, 2), (3, \"james\", 18, 3))).toDF(\"id\", 
\"name\", \"age\", \"c_id\")\ndf1.show()\n\nval df2 \u003d 
spark.createDataFrame(Seq((1, \"USA\"), (2, \"China\"))).toDF(\"c_id\", 
\"c_name\")\ndf2.show()\n\n// You can just specify the key name if join on the 
same key\nval df3 \u003d df1.join(df2, \"c_id\")\ndf3.show()\n\n// Or you can 
specify the join condition expclitly in case the key is d [...]
+      "text": "%spark\n\nval df1 \u003d spark.createDataFrame(Seq((1, 
\"andy\", 20, 1), (2, \"jeff\", 23, 2), (3, \"james\", 18, 3))).toDF(\"id\", 
\"name\", \"age\", \"c_id\")\ndf1.show()\n\nval df2 \u003d 
spark.createDataFrame(Seq((1, \"USA\"), (2, \"China\"))).toDF(\"c_id\", 
\"c_name\")\ndf2.show()\n\n// You can just specify the key name if join on the 
same key\nval df3 \u003d df1.join(df2, \"c_id\")\ndf3.show()\n\n// Or you can 
specify the join condition explicitly in case the key is  [...]
       "user": "anonymous",
       "dateUpdated": "2020-03-11 13:34:11.058",
       "config": {
@@ -600,7 +600,7 @@
     },
     {
       "title": "Join on Multiple Fields",
-      "text": "%spark\n\nval df1 \u003d spark.createDataFrame(Seq((\"andy\", 
20, 1, 1), (\"jeff\", 23, 1, 2), (\"james\", 12, 2, 2))).toDF(\"name\", 
\"age\", \"key_1\", \"key_2\")\ndf1.show()\n\nval df2 \u003d 
spark.createDataFrame(Seq((1, 1, \"USA\"), (2, 2, \"China\"))).toDF(\"key_1\", 
\"key_2\", \"country\")\ndf2.show()\n\n// Join on 2 fields: key_1, key_2\n\n// 
You can pass a list of field name if the join field names are the same in both 
tables\nval df3 \u003d df1.join(df2, Seq(\"ke [...]
+      "text": "%spark\n\nval df1 \u003d spark.createDataFrame(Seq((\"andy\", 
20, 1, 1), (\"jeff\", 23, 1, 2), (\"james\", 12, 2, 2))).toDF(\"name\", 
\"age\", \"key_1\", \"key_2\")\ndf1.show()\n\nval df2 \u003d 
spark.createDataFrame(Seq((1, 1, \"USA\"), (2, 2, \"China\"))).toDF(\"key_1\", 
\"key_2\", \"country\")\ndf2.show()\n\n// Join on 2 fields: key_1, key_2\n\n// 
You can pass a list of field name if the join field names are the same in both 
tables\nval df3 \u003d df1.join(df2, Seq(\"ke [...]
       "user": "anonymous",
       "dateUpdated": "2020-03-11 13:34:12.577",
       "config": {
@@ -688,7 +688,7 @@
     },
     {
       "title": "Visualize DataFrame/Dataset",
-      "text": "%md\n\nThere\u0027s 2 approaches to visuliaze DataFrame/Dataset 
in Zeppelin\n\n* Use SparkSQLInterpreter via `%spark.sql`\n* Use 
ZeppelinContext via `z.show`\n\n",
+      "text": "%md\n\nThere\u0027s 2 approaches to visualize DataFrame/Dataset 
in Zeppelin\n\n* Use SparkSQLInterpreter via `%spark.sql`\n* Use 
ZeppelinContext via `z.show`\n\n",
       "user": "anonymous",
       "dateUpdated": "2020-01-21 15:55:08.071",
       "config": {
@@ -716,7 +716,7 @@
         "msg": [
           {
             "type": "HTML",
-            "data": "\u003cdiv 
class\u003d\"markdown-body\"\u003e\n\u003cp\u003eThere\u0026rsquo;s 2 
approaches to visuliaze DataFrame/Dataset in 
Zeppelin\u003c/p\u003e\n\u003cul\u003e\n  \u003cli\u003eUse SparkSQLInterpreter 
via \u003ccode\u003e%spark.sql\u003c/code\u003e\u003c/li\u003e\n  
\u003cli\u003eUse ZeppelinContext via 
\u003ccode\u003ez.show\u003c/code\u003e\u003c/li\u003e\n\u003c/ul\u003e\n\u003c/div\u003e"
+            "data": "\u003cdiv 
class\u003d\"markdown-body\"\u003e\n\u003cp\u003eThere\u0026rsquo;s 2 
approaches to visualize DataFrame/Dataset in 
Zeppelin\u003c/p\u003e\n\u003cul\u003e\n  \u003cli\u003eUse SparkSQLInterpreter 
via \u003ccode\u003e%spark.sql\u003c/code\u003e\u003c/li\u003e\n  
\u003cli\u003eUse ZeppelinContext via 
\u003ccode\u003ez.show\u003c/code\u003e\u003c/li\u003e\n\u003c/ul\u003e\n\u003c/div\u003e"
           }
         ]
       },
diff --git a/notebook/Spark Tutorial/4. Spark MlLib_2EZFM3GJA.zpln 
b/notebook/Spark Tutorial/4. Spark MlLib_2EZFM3GJA.zpln
index fa5998b..53540ed 100644
--- a/notebook/Spark Tutorial/4. Spark MlLib_2EZFM3GJA.zpln     
+++ b/notebook/Spark Tutorial/4. Spark MlLib_2EZFM3GJA.zpln     
@@ -2,7 +2,7 @@
   "paragraphs": [
     {
       "title": "Introduction",
-      "text": "%md\n\nThis is a tutorial of how to use Spark MLlib in 
Zeppelin, we have 2 examples in this note:\n\n* Linear regression, we generate 
some random data and use a linear regression to fit this data. We use bokeh 
here to visualize the data and the fitted model.  Besides training, we also 
visualize the loss value over iteration.\n* Logstic regression, we use the 
offical `sample_binary_classification_data` of spark as the training data. 
Besides training, we also visualize the l [...]
+      "text": "%md\n\nThis is a tutorial of how to use Spark MLlib in 
Zeppelin, we have 2 examples in this note:\n\n* Linear regression, we generate 
some random data and use a linear regression to fit this data. We use bokeh 
here to visualize the data and the fitted model.  Besides training, we also 
visualize the loss value over iteration.\n* Logstic regression, we use the 
official `sample_binary_classification_data` of spark as the training data. 
Besides training, we also visualize the  [...]
       "user": "anonymous",
       "dateUpdated": "2020-03-11 14:08:34.165",
       "config": {

Reply via email to