This is an automated email from the ASF dual-hosted git repository.
zjffdu pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/zeppelin.git
The following commit(s) were added to refs/heads/master by this push:
new f3bdd4a [ZEPPELIN-5241] Typos in spark tutorial
f3bdd4a is described below
commit f3bdd4a1fa0cf19bc1015955d8ade4bc79a8e16f
Author: OmriK <[email protected]>
AuthorDate: Sun Feb 7 18:12:06 2021 +0200
[ZEPPELIN-5241] Typos in spark tutorial
### What is this PR for?
Fixing some typos from the tutorials notebook
### What type of PR is it?
Documentation
### Todos
* [x] - Task
### What is the Jira issue?
[ZEPPELIN-5241](https://issues.apache.org/jira/browse/ZEPPELIN-5241)
### How should this be tested?
* Standard CI tests
### Screenshots (if appropriate)
### Questions:
* Does the licenses files need update? - no
* Is there breaking changes for older versions? - no
* Does this needs documentation? - no
Author: OmriK <[email protected]>
Closes #4048 from omrisk/typos_in_spark_tutorial and squashes the following
commits:
d85861463 [OmriK] Checked part 1
---
.... Spark Interpreter Introduction_2F8KN6TKK.zpln | 26 +++++++++++-----------
.../3. Spark SQL (PySpark)_2EWM84JXA.zpln | 10 ++++-----
.../3. Spark SQL (Scala)_2EYUV26VR.zpln | 14 ++++++------
.../Spark Tutorial/4. Spark MlLib_2EZFM3GJA.zpln | 2 +-
4 files changed, 26 insertions(+), 26 deletions(-)
diff --git a/notebook/Spark Tutorial/1. Spark Interpreter
Introduction_2F8KN6TKK.zpln b/notebook/Spark Tutorial/1. Spark Interpreter
Introduction_2F8KN6TKK.zpln
index d085d9f..0f3cb7a 100644
--- a/notebook/Spark Tutorial/1. Spark Interpreter Introduction_2F8KN6TKK.zpln
+++ b/notebook/Spark Tutorial/1. Spark Interpreter Introduction_2F8KN6TKK.zpln
@@ -2,7 +2,7 @@
"paragraphs": [
{
"title": "",
- "text": "%md\n\n# Introduction\n\nThis tutorial is for how to use Spark
Interpreter in Zeppelin.\n\n1. Specify `SPARK_HOME` in interpreter setting. If
you don\u0027t specify `SPARK_HOME`, Zeppelin will use the embedded spark which
can only run in local mode. And some advanced features may not work in this
embedded spark.\n2. Specify `spark.master` for spark execution mode.\n *
`local[*]` - Driver and Executor would both run in the same host of zeppelin
server. It is only for te [...]
+ "text": "%md\n\n# Introduction\n\nThis tutorial is for how to use Spark
Interpreter in Zeppelin.\n\n1. Specify `SPARK_HOME` in interpreter setting. If
you don\u0027t specify `SPARK_HOME`, Zeppelin will use the embedded spark which
can only run in local mode. And some advanced features may not work in this
embedded spark.\n2. Specify `spark.master` for spark execution mode.\n *
`local[*]` - Driver and Executor would both run in the same host of zeppelin
server. It is only for te [...]
"user": "anonymous",
"dateUpdated": "2020-05-04 13:44:39.482",
"config": {
@@ -29,7 +29,7 @@
"msg": [
{
"type": "HTML",
- "data": "\u003cdiv
class\u003d\"markdown-body\"\u003e\n\u003ch1\u003eIntroduction\u003c/h1\u003e\n\u003cp\u003eThis
tutorial is for how to use Spark Interpreter in
Zeppelin.\u003c/p\u003e\n\u003col\u003e\n\u003cli\u003eSpecify
\u003ccode\u003eSPARK_HOME\u003c/code\u003e in interpreter setting. If you
don\u0026rsquo;t specify \u003ccode\u003eSPARK_HOME\u003c/code\u003e, Zeppelin
will use the embedded spark which can only run in local mode. And some advanced
features may not wo [...]
+ "data": "\u003cdiv
class\u003d\"markdown-body\"\u003e\n\u003ch1\u003eIntroduction\u003c/h1\u003e\n\u003cp\u003eThis
tutorial is for how to use Spark Interpreter in
Zeppelin.\u003c/p\u003e\n\u003col\u003e\n\u003cli\u003eSpecify
\u003ccode\u003eSPARK_HOME\u003c/code\u003e in interpreter setting. If you
don\u0026rsquo;t specify \u003ccode\u003eSPARK_HOME\u003c/code\u003e, Zeppelin
will use the embedded spark which can only run in local mode. And some advanced
features may not wo [...]
}
]
},
@@ -44,7 +44,7 @@
},
{
"title": "Use Generic Inline Configuration instead of Interpreter
Setting",
- "text": "%md\n\nCustomize your spark interpreter is indispensible for
Zeppelin Notebook. E.g. You want to add third party jars, change the execution
mode, change the number of exceutor or its memory and etc. You can check this
link for all the available [spark
configuration](http://spark.apache.org/docs/latest/configuration.html)\nAlthough
you can customize these in interpreter setting, it is recommended to do via
the generic inline configuration. Because interpreter setting is sha [...]
+ "text": "%md\n\nCustomize your spark interpreter is indispensable for
Zeppelin Notebook. E.g. You want to add third party jars, change the execution
mode, change the number of executor or its memory and etc. You can check this
link for all the available [spark
configuration](http://spark.apache.org/docs/latest/configuration.html)\nAlthough
you can customize these in interpreter setting, it is recommended to do via
the generic inline configuration. Because interpreter setting is sha [...]
"user": "anonymous",
"dateUpdated": "2020-05-04 13:45:44.204",
"config": {
@@ -72,7 +72,7 @@
"msg": [
{
"type": "HTML",
- "data": "\u003cdiv
class\u003d\"markdown-body\"\u003e\n\u003cp\u003eCustomize your spark
interpreter is indispensible for Zeppelin Notebook. E.g. You want to add third
party jars, change the execution mode, change the number of exceutor or its
memory and etc. You can check this link for all the available \u003ca
href\u003d\"http://spark.apache.org/docs/latest/configuration.html\"\u003espark
configuration\u003c/a\u003e\u003cbr /\u003e\nAlthough you can customize these
in inter [...]
+ "data": "\u003cdiv
class\u003d\"markdown-body\"\u003e\n\u003cp\u003eCustomize your spark
interpreter is indispensable for Zeppelin Notebook. E.g. You want to add third
party jars, change the execution mode, change the number of executor or its
memory and etc. You can check this link for all the available \u003ca
href\u003d\"http://spark.apache.org/docs/latest/configuration.html\"\u003espark
configuration\u003c/a\u003e\u003cbr /\u003e\nAlthough you can customize these
in inter [...]
}
]
},
@@ -87,7 +87,7 @@
},
{
"title": "Generic Inline Configuration",
- "text": "%spark.conf\n\nSPARK_HOME \u003cPATH_TO_SPAKR_HOME\u003e\n\n#
set driver memrory to 8g\nspark.driver.memory 8g\n\n# set executor number to be
6\nspark.executor.instances 6\n\n# set executor memrory
4g\nspark.executor.memory 4g\n\n# Any other spark properties can be set here.
Here\u0027s avaliable spark configruation you can set.
(http://spark.apache.org/docs/latest/configuration.html)\n",
+ "text": "%spark.conf\n\nSPARK_HOME \u003cPATH_TO_SPARK_HOME\u003e\n\n#
set driver memory to 8g\nspark.driver.memory 8g\n\n# set executor number to be
6\nspark.executor.instances 6\n\n# set executor memory
4g\nspark.executor.memory 4g\n\n# Any other spark properties can be set here.
Here\u0027s avaliable spark configruation you can set.
(http://spark.apache.org/docs/latest/configuration.html)\n",
"user": "anonymous",
"dateUpdated": "2020-04-30 10:56:30.840",
"config": {
@@ -145,7 +145,7 @@
"msg": [
{
"type": "HTML",
- "data": "\u003cdiv
class\u003d\"markdown-body\"\u003e\n\u003cp\u003eThere\u0026rsquo;re 2 ways to
add third party
libraries.\u003c/p\u003e\n\u003cul\u003e\n\u003cli\u003e\u003ccode\u003eGeneric
Inline Configuration\u003c/code\u003e It is the recommended way to add third
party jars/packages. Use \u003ccode\u003espark.jars\u003c/code\u003e for adding
local jar file and \u003ccode\u003espark.jars.packages\u003c/code\u003e for
adding packages\u003c/li\u003e\n\u003cli\u003e\u003 [...]
+ "data": "\u003cdiv
class\u003d\"markdown-body\"\u003e\n\u003cp\u003eThere\u0026rsquo;re 2 ways to
add third party
libraries.\u003c/p\u003e\n\u003cul\u003e\n\u003cli\u003e\u003ccode\u003eGeneric
Inline Configuration\u003c/code\u003e It is the recommended way to add third
party jars/packages. Use \u003ccode\u003espark.jars\u003c/code\u003e for adding
local jar file and \u003ccode\u003espark.jars.packages\u003c/code\u003e for
adding packages\u003c/li\u003e\n\u003cli\u003e\u003 [...]
}
]
},
@@ -160,7 +160,7 @@
},
{
"title": "",
- "text": "%spark.conf\n\n# Must set SPARK_HOME for this example, because
it won\u0027t work for Zeppelin\u0027s embedded spark mode. The embedded spark
mode doesn\u0027t \n# use spark-submit to launch spark interpreter, so
spark.jars and spark.jars.packages won\u0027t take affect. \nSPARK_HOME
\u003cPATH_TO_SPAKR_HOME\u003e\n\n# set execution mode\nmaster yarn-client\n\n#
spark.jars can be used for adding any local jar files into spark interpreter\n#
spark.jars \u003cpath_to_local_ [...]
+ "text": "%spark.conf\n\n# Must set SPARK_HOME for this example, because
it won\u0027t work for Zeppelin\u0027s embedded spark mode. The embedded spark
mode doesn\u0027t \n# use spark-submit to launch spark interpreter, so
spark.jars and spark.jars.packages won\u0027t take affect. \nSPARK_HOME
\u003cPATH_TO_SPARK_HOME\u003e\n\n# set execution mode\nmaster yarn-client\n\n#
spark.jars can be used for adding any local jar files into spark interpreter\n#
spark.jars \u003cpath_to_local_ [...]
"user": "anonymous",
"dateUpdated": "2020-04-30 11:01:36.681",
"config": {
@@ -272,7 +272,7 @@
},
{
"title": "Code Completion in Scala",
- "text": "%md\n\nSpark interpreter provide code completion feature. As
long as you type `tab`, code completion will start to work and provide you with
a list of candiates. Here\u0027s one screenshot of how it works. \n\n**To be
noticed**, code completion only works after spark interpreter is launched. So
it will not work when you type code in the first paragraph as the spark
interpreter is not launched yet. For me, usually I will run one simple code
such as `sc.version` to launch sp [...]
+ "text": "%md\n\nSpark interpreter provide code completion feature. As
long as you type `tab`, code completion will start to work and provide you with
a list of candidates. Here\u0027s one screenshot of how it works. \n\n**To be
noticed**, code completion only works after spark interpreter is launched. So
it will not work when you type code in the first paragraph as the spark
interpreter is not launched yet. For me, usually I will run one simple code
such as `sc.version` to launch s [...]
"user": "anonymous",
"dateUpdated": "2020-04-30 11:03:03.127",
"config": {
@@ -300,7 +300,7 @@
"msg": [
{
"type": "HTML",
- "data": "\u003cdiv
class\u003d\"markdown-body\"\u003e\n\u003cp\u003eSpark interpreter provide code
completion feature. As long as you type \u003ccode\u003etab\u003c/code\u003e,
code completion will start to work and provide you with a list of candiates.
Here\u0026rsquo;s one screenshot of how it
works.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eTo be
noticed\u003c/strong\u003e, code completion only works after spark interpreter
is launched. So it will not work when you typ [...]
+ "data": "\u003cdiv
class\u003d\"markdown-body\"\u003e\n\u003cp\u003eSpark interpreter provide code
completion feature. As long as you type \u003ccode\u003etab\u003c/code\u003e,
code completion will start to work and provide you with a list of candidates.
Here\u0026rsquo;s one screenshot of how it
works.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eTo be
noticed\u003c/strong\u003e, code completion only works after spark interpreter
is launched. So it will not work when you ty [...]
}
]
},
@@ -315,7 +315,7 @@
},
{
"title": "PySpark",
- "text": "%md\n\nFor using PySpark, you need to do some other pyspark
configration besides the above spark configuration we mentioned before. The
most important property you need to set is python path for both driver and
executor. If you hit the following error, it means your python on driver is
mismatched with that of executor. In this case you need to check the 2
properties: `PYSPARK_PYTHON` and `PYSPARK_DRIVER_PYTHON`. (You can use
`spark.pyspark.python` and `spark.pyspark.driver [...]
+ "text": "%md\n\nFor using PySpark, you need to do some other pyspark
configuration besides the above spark configuration we mentioned before. The
most important property you need to set is python path for both driver and
executor. If you hit the following error, it means your python on driver is
mismatched with that of executor. In this case you need to check the 2
properties: `PYSPARK_PYTHON` and `PYSPARK_DRIVER_PYTHON`. (You can use
`spark.pyspark.python` and `spark.pyspark.drive [...]
"user": "anonymous",
"dateUpdated": "2020-04-30 11:04:18.086",
"config": {
@@ -343,7 +343,7 @@
"msg": [
{
"type": "HTML",
- "data": "\u003cdiv
class\u003d\"markdown-body\"\u003e\n\u003cp\u003eFor using PySpark, you need to
do some other pyspark configration besides the above spark configuration we
mentioned before. The most important property you need to set is python path
for both driver and executor. If you hit the following error, it means your
python on driver is mismatched with that of executor. In this case you need to
check the 2 properties: \u003ccode\u003ePYSPARK_PYTHON\u003c/code\u003e a [...]
+ "data": "\u003cdiv
class\u003d\"markdown-body\"\u003e\n\u003cp\u003eFor using PySpark, you need to
do some other pyspark configuration besides the above spark configuration we
mentioned before. The most important property you need to set is python path
for both driver and executor. If you hit the following error, it means your
python on driver is mismatched with that of executor. In this case you need to
check the 2 properties: \u003ccode\u003ePYSPARK_PYTHON\u003c/code\u003e [...]
}
]
},
@@ -392,7 +392,7 @@
},
{
"title": "Use IPython",
- "text": "%md\n\nStarting from Zeppelin 0.8.0, `ipython` is integrated
into Zeppelin. And `PySparkInterpreter`(`%spark.pyspark`) would use `ipython`
if it is avalible. It is recommended to use `ipython` interpreter as it
provides more powerful feature than the old PythonInterpreter. Spark create a
new interpreter called `IPySparkInterpreter` (`%spark.ipyspark`) which use
IPython underneath. You can use all the `ipython` features in this
IPySparkInterpreter. There\u0027s one ipython [...]
+ "text": "%md\n\nStarting from Zeppelin 0.8.0, `ipython` is integrated
into Zeppelin. And `PySparkInterpreter`(`%spark.pyspark`) would use `ipython`
if it is available. It is recommended to use `ipython` interpreter as it
provides more powerful feature than the old PythonInterpreter. Spark create a
new interpreter called `IPySparkInterpreter` (`%spark.ipyspark`) which use
IPython underneath. You can use all the `ipython` features in this
IPySparkInterpreter. There\u0027s one ipython [...]
"user": "anonymous",
"dateUpdated": "2020-04-30 11:10:07.426",
"config": {
@@ -420,7 +420,7 @@
"msg": [
{
"type": "HTML",
- "data": "\u003cdiv
class\u003d\"markdown-body\"\u003e\n\u003cp\u003eStarting from Zeppelin 0.8.0,
\u003ccode\u003eipython\u003c/code\u003e is integrated into Zeppelin. And
\u003ccode\u003ePySparkInterpreter\u003c/code\u003e(\u003ccode\u003e%spark.pyspark\u003c/code\u003e)
would use \u003ccode\u003eipython\u003c/code\u003e if it is avalible. It is
recommended to use \u003ccode\u003eipython\u003c/code\u003e interpreter as it
provides more powerful feature than the old PythonInt [...]
+ "data": "\u003cdiv
class\u003d\"markdown-body\"\u003e\n\u003cp\u003eStarting from Zeppelin 0.8.0,
\u003ccode\u003eipython\u003c/code\u003e is integrated into Zeppelin. And
\u003ccode\u003ePySparkInterpreter\u003c/code\u003e(\u003ccode\u003e%spark.pyspark\u003c/code\u003e)
would use \u003ccode\u003eipython\u003c/code\u003e if it is available. It is
recommended to use \u003ccode\u003eipython\u003c/code\u003e interpreter as it
provides more powerful feature than the old PythonIn [...]
}
]
},
diff --git a/notebook/Spark Tutorial/3. Spark SQL (PySpark)_2EWM84JXA.zpln
b/notebook/Spark Tutorial/3. Spark SQL (PySpark)_2EWM84JXA.zpln
index 53c5ca3..7802e98 100644
--- a/notebook/Spark Tutorial/3. Spark SQL (PySpark)_2EWM84JXA.zpln
+++ b/notebook/Spark Tutorial/3. Spark SQL (PySpark)_2EWM84JXA.zpln
@@ -2,7 +2,7 @@
"paragraphs": [
{
"title": "Introduction",
- "text": "%md\n\nThis is a tutorial for Spark SQL in PySpark (based on
Spark 2.x). First we need to clarifiy serveral concetps of Spark SQL\n\n*
**SparkSession** - This is the entry point of Spark SQL, you need use
`SparkSession` to create DataFrame/Dataset, register UDF, query table and
etc.\n* **DataFrame** - There\u0027s no Dataset in PySpark, but only
DataFrame. The DataFrame of PySpark is very similar with DataFrame concept of
Pandas, but is distributed. \n",
+ "text": "%md\n\nThis is a tutorial for Spark SQL in PySpark (based on
Spark 2.x). First we need to clarify several concepts of Spark SQL\n\n*
**SparkSession** - This is the entry point of Spark SQL, you need use
`SparkSession` to create DataFrame/Dataset, register UDF, query table and
etc.\n* **DataFrame** - There\u0027s no Dataset in PySpark, but only
DataFrame. The DataFrame of PySpark is very similar with DataFrame concept of
Pandas, but is distributed. \n",
"user": "anonymous",
"dateUpdated": "2020-03-11 11:16:37.393",
"config": {
@@ -32,7 +32,7 @@
"msg": [
{
"type": "HTML",
- "data": "\u003cdiv
class\u003d\"markdown-body\"\u003e\n\u003cp\u003eThis is a tutorial for Spark
SQL in PySpark (based on Spark 2.x). First we need to clarifiy serveral
concetps of Spark
SQL\u003c/p\u003e\n\u003cul\u003e\n\u003cli\u003e\u003cstrong\u003eSparkSession\u003c/strong\u003e
- This is the entry point of Spark SQL, you need use
\u003ccode\u003eSparkSession\u003c/code\u003e to create DataFrame/Dataset,
register UDF, query table and etc.\u003c/li\u003e\n\u003cli\u00 [...]
+ "data": "\u003cdiv
class\u003d\"markdown-body\"\u003e\n\u003cp\u003eThis is a tutorial for Spark
SQL in PySpark (based on Spark 2.x). First we need to clarify several concepts
of Spark
SQL\u003c/p\u003e\n\u003cul\u003e\n\u003cli\u003e\u003cstrong\u003eSparkSession\u003c/strong\u003e
- This is the entry point of Spark SQL, you need use
\u003ccode\u003eSparkSession\u003c/code\u003e to create DataFrame/Dataset,
register UDF, query table and etc.\u003c/li\u003e\n\u003cli\u003e [...]
}
]
},
@@ -137,7 +137,7 @@
},
{
"title": "Spark Configuration",
- "text": "%spark.conf\n\n# It is strongly recommended to set SPARK_HOME
explictly instead of using the embedded spark of Zeppelin. As the function of
embedded spark of Zeppelin is limited and can only run in local mode.\n#
SPARK_HOME \u003cyour_spark_dist_path\u003e\n\n# Uncomment the following line
if you want to use yarn-cluster mode (It is recommended to use yarn-cluster
mode from Zeppelin 0.8, as the driver will run on the remote host of yarn
cluster which can mitigate memory pr [...]
+ "text": "%spark.conf\n\n# It is strongly recommended to set SPARK_HOME
explicitly instead of using the embedded spark of Zeppelin. As the function of
embedded spark of Zeppelin is limited and can only run in local mode.\n#
SPARK_HOME \u003cyour_spark_dist_path\u003e\n\n# Uncomment the following line
if you want to use yarn-cluster mode (It is recommended to use yarn-cluster
mode from Zeppelin 0.8, as the driver will run on the remote host of yarn
cluster which can mitigate memory p [...]
"user": "anonymous",
"dateUpdated": "2020-03-11 13:26:01.470",
"config": {
@@ -713,7 +713,7 @@
},
{
"title": "Visualize DataFrame/Dataset",
- "text": "%md\n\nThere\u0027s 2 approaches to visuliaze DataFrame/Dataset
in Zeppelin\n\n* Use SparkSQLInterpreter via `%spark.sql`\n* Use
ZeppelinContext via `z.show`\n\n",
+ "text": "%md\n\nThere\u0027s 2 approaches to visualize DataFrame/Dataset
in Zeppelin\n\n* Use SparkSQLInterpreter via `%spark.sql`\n* Use
ZeppelinContext via `z.show`\n\n",
"user": "anonymous",
"dateUpdated": "2020-01-21 15:47:18.301",
"config": {
@@ -743,7 +743,7 @@
"msg": [
{
"type": "HTML",
- "data": "\u003cdiv
class\u003d\"markdown-body\"\u003e\n\u003cp\u003eThere\u0026rsquo;s 2
approaches to visuliaze DataFrame/Dataset in
Zeppelin\u003c/p\u003e\n\u003cul\u003e\n\u003cli\u003eUse SparkSQLInterpreter
via
\u003ccode\u003e%spark.sql\u003c/code\u003e\u003c/li\u003e\n\u003cli\u003eUse
ZeppelinContext via
\u003ccode\u003ez.show\u003c/code\u003e\u003c/li\u003e\n\u003c/ul\u003e\n\n\u003c/div\u003e"
+ "data": "\u003cdiv
class\u003d\"markdown-body\"\u003e\n\u003cp\u003eThere\u0026rsquo;s 2
approaches to visualize DataFrame/Dataset in
Zeppelin\u003c/p\u003e\n\u003cul\u003e\n\u003cli\u003eUse SparkSQLInterpreter
via
\u003ccode\u003e%spark.sql\u003c/code\u003e\u003c/li\u003e\n\u003cli\u003eUse
ZeppelinContext via
\u003ccode\u003ez.show\u003c/code\u003e\u003c/li\u003e\n\u003c/ul\u003e\n\n\u003c/div\u003e"
}
]
},
diff --git a/notebook/Spark Tutorial/3. Spark SQL (Scala)_2EYUV26VR.zpln
b/notebook/Spark Tutorial/3. Spark SQL (Scala)_2EYUV26VR.zpln
index 7ad5809..da4002f 100644
--- a/notebook/Spark Tutorial/3. Spark SQL (Scala)_2EYUV26VR.zpln
+++ b/notebook/Spark Tutorial/3. Spark SQL (Scala)_2EYUV26VR.zpln
@@ -2,7 +2,7 @@
"paragraphs": [
{
"title": "Introduction",
- "text": "%md\n\nThis is a tutorial for Spark SQL in scala (based on
Spark 2.x). First we need to clarifiy serveral basic concepts of Spark
SQL\n\n* **SparkSession** - This is the entry point of Spark SQL, you need
use `SparkSession` to create DataFrame/Dataset, register UDF, query table and
etc.\n* **Dataset** - Dataset is the core abstraction of Spark SQL.
Underneath Dataset is RDD, but Dataset know more about your data, specifically
its structure, so that Dataset could [...]
+ "text": "%md\n\nThis is a tutorial for Spark SQL in scala (based on
Spark 2.x). First we need to clarify several basic concepts of Spark SQL\n\n*
**SparkSession** - This is the entry point of Spark SQL, you need use
`SparkSession` to create DataFrame/Dataset, register UDF, query table and
etc.\n* **Dataset** - Dataset is the core abstraction of Spark SQL.
Underneath Dataset is RDD, but Dataset know more about your data, specifically
its structure, so that Dataset could do [...]
"user": "anonymous",
"dateUpdated": "2020-03-11 13:26:59.236",
"config": {
@@ -32,7 +32,7 @@
"msg": [
{
"type": "HTML",
- "data": "\u003cdiv
class\u003d\"markdown-body\"\u003e\n\u003cp\u003eThis is a tutorial for Spark
SQL in scala (based on Spark 2.x). First we need to clarifiy serveral basic
concepts of Spark
SQL\u003c/p\u003e\n\u003cul\u003e\n\u003cli\u003e\u003cstrong\u003eSparkSession\u003c/strong\u003e
- This is the entry point of Spark SQL, you need use
\u003ccode\u003eSparkSession\u003c/code\u003e to create DataFrame/Dataset,
register UDF, query table and etc.\u003c/li\u003e\n\u003cli [...]
+ "data": "\u003cdiv
class\u003d\"markdown-body\"\u003e\n\u003cp\u003eThis is a tutorial for Spark
SQL in scala (based on Spark 2.x). First we need to clarify several basic
concepts of Spark
SQL\u003c/p\u003e\n\u003cul\u003e\n\u003cli\u003e\u003cstrong\u003eSparkSession\u003c/strong\u003e
- This is the entry point of Spark SQL, you need use
\u003ccode\u003eSparkSession\u003c/code\u003e to create DataFrame/Dataset,
register UDF, query table and etc.\u003c/li\u003e\n\u003cli\u [...]
}
]
},
@@ -140,7 +140,7 @@
},
{
"title": "",
- "text": "%spark.conf\n\n# It is strongly recommended to set SPARK_HOME
explictly instead of using the embedded spark of Zeppelin. As the function of
embedded spark of Zeppelin is limited and can only run in local mode.\n#
SPARK_HOME \u003cyour_spark_dist_path\u003e\n\n# Uncomment the following line
if you want to use yarn-cluster mode (It is recommended to use yarn-cluster
mode after Zeppelin 0.8, as the driver will run on the remote host of yarn
cluster which can mitigate memory p [...]
+ "text": "%spark.conf\n\n# It is strongly recommended to set SPARK_HOME
explicitly instead of using the embedded spark of Zeppelin. As the function of
embedded spark of Zeppelin is limited and can only run in local mode.\n#
SPARK_HOME \u003cyour_spark_dist_path\u003e\n\n# Uncomment the following line
if you want to use yarn-cluster mode (It is recommended to use yarn-cluster
mode after Zeppelin 0.8, as the driver will run on the remote host of yarn
cluster which can mitigate memory [...]
"user": "anonymous",
"dateUpdated": "2020-03-11 13:28:13.784",
"config": {
@@ -556,7 +556,7 @@
},
{
"title": "Join on Single Field",
- "text": "%spark\n\nval df1 \u003d spark.createDataFrame(Seq((1,
\"andy\", 20, 1), (2, \"jeff\", 23, 2), (3, \"james\", 18, 3))).toDF(\"id\",
\"name\", \"age\", \"c_id\")\ndf1.show()\n\nval df2 \u003d
spark.createDataFrame(Seq((1, \"USA\"), (2, \"China\"))).toDF(\"c_id\",
\"c_name\")\ndf2.show()\n\n// You can just specify the key name if join on the
same key\nval df3 \u003d df1.join(df2, \"c_id\")\ndf3.show()\n\n// Or you can
specify the join condition expclitly in case the key is d [...]
+ "text": "%spark\n\nval df1 \u003d spark.createDataFrame(Seq((1,
\"andy\", 20, 1), (2, \"jeff\", 23, 2), (3, \"james\", 18, 3))).toDF(\"id\",
\"name\", \"age\", \"c_id\")\ndf1.show()\n\nval df2 \u003d
spark.createDataFrame(Seq((1, \"USA\"), (2, \"China\"))).toDF(\"c_id\",
\"c_name\")\ndf2.show()\n\n// You can just specify the key name if join on the
same key\nval df3 \u003d df1.join(df2, \"c_id\")\ndf3.show()\n\n// Or you can
specify the join condition explicitly in case the key is [...]
"user": "anonymous",
"dateUpdated": "2020-03-11 13:34:11.058",
"config": {
@@ -600,7 +600,7 @@
},
{
"title": "Join on Multiple Fields",
- "text": "%spark\n\nval df1 \u003d spark.createDataFrame(Seq((\"andy\",
20, 1, 1), (\"jeff\", 23, 1, 2), (\"james\", 12, 2, 2))).toDF(\"name\",
\"age\", \"key_1\", \"key_2\")\ndf1.show()\n\nval df2 \u003d
spark.createDataFrame(Seq((1, 1, \"USA\"), (2, 2, \"China\"))).toDF(\"key_1\",
\"key_2\", \"country\")\ndf2.show()\n\n// Join on 2 fields: key_1, key_2\n\n//
You can pass a list of field name if the join field names are the same in both
tables\nval df3 \u003d df1.join(df2, Seq(\"ke [...]
+ "text": "%spark\n\nval df1 \u003d spark.createDataFrame(Seq((\"andy\",
20, 1, 1), (\"jeff\", 23, 1, 2), (\"james\", 12, 2, 2))).toDF(\"name\",
\"age\", \"key_1\", \"key_2\")\ndf1.show()\n\nval df2 \u003d
spark.createDataFrame(Seq((1, 1, \"USA\"), (2, 2, \"China\"))).toDF(\"key_1\",
\"key_2\", \"country\")\ndf2.show()\n\n// Join on 2 fields: key_1, key_2\n\n//
You can pass a list of field name if the join field names are the same in both
tables\nval df3 \u003d df1.join(df2, Seq(\"ke [...]
"user": "anonymous",
"dateUpdated": "2020-03-11 13:34:12.577",
"config": {
@@ -688,7 +688,7 @@
},
{
"title": "Visualize DataFrame/Dataset",
- "text": "%md\n\nThere\u0027s 2 approaches to visuliaze DataFrame/Dataset
in Zeppelin\n\n* Use SparkSQLInterpreter via `%spark.sql`\n* Use
ZeppelinContext via `z.show`\n\n",
+ "text": "%md\n\nThere\u0027s 2 approaches to visualize DataFrame/Dataset
in Zeppelin\n\n* Use SparkSQLInterpreter via `%spark.sql`\n* Use
ZeppelinContext via `z.show`\n\n",
"user": "anonymous",
"dateUpdated": "2020-01-21 15:55:08.071",
"config": {
@@ -716,7 +716,7 @@
"msg": [
{
"type": "HTML",
- "data": "\u003cdiv
class\u003d\"markdown-body\"\u003e\n\u003cp\u003eThere\u0026rsquo;s 2
approaches to visuliaze DataFrame/Dataset in
Zeppelin\u003c/p\u003e\n\u003cul\u003e\n \u003cli\u003eUse SparkSQLInterpreter
via \u003ccode\u003e%spark.sql\u003c/code\u003e\u003c/li\u003e\n
\u003cli\u003eUse ZeppelinContext via
\u003ccode\u003ez.show\u003c/code\u003e\u003c/li\u003e\n\u003c/ul\u003e\n\u003c/div\u003e"
+ "data": "\u003cdiv
class\u003d\"markdown-body\"\u003e\n\u003cp\u003eThere\u0026rsquo;s 2
approaches to visualize DataFrame/Dataset in
Zeppelin\u003c/p\u003e\n\u003cul\u003e\n \u003cli\u003eUse SparkSQLInterpreter
via \u003ccode\u003e%spark.sql\u003c/code\u003e\u003c/li\u003e\n
\u003cli\u003eUse ZeppelinContext via
\u003ccode\u003ez.show\u003c/code\u003e\u003c/li\u003e\n\u003c/ul\u003e\n\u003c/div\u003e"
}
]
},
diff --git a/notebook/Spark Tutorial/4. Spark MlLib_2EZFM3GJA.zpln
b/notebook/Spark Tutorial/4. Spark MlLib_2EZFM3GJA.zpln
index fa5998b..53540ed 100644
--- a/notebook/Spark Tutorial/4. Spark MlLib_2EZFM3GJA.zpln
+++ b/notebook/Spark Tutorial/4. Spark MlLib_2EZFM3GJA.zpln
@@ -2,7 +2,7 @@
"paragraphs": [
{
"title": "Introduction",
- "text": "%md\n\nThis is a tutorial of how to use Spark MLlib in
Zeppelin, we have 2 examples in this note:\n\n* Linear regression, we generate
some random data and use a linear regression to fit this data. We use bokeh
here to visualize the data and the fitted model. Besides training, we also
visualize the loss value over iteration.\n* Logstic regression, we use the
offical `sample_binary_classification_data` of spark as the training data.
Besides training, we also visualize the l [...]
+ "text": "%md\n\nThis is a tutorial of how to use Spark MLlib in
Zeppelin, we have 2 examples in this note:\n\n* Linear regression, we generate
some random data and use a linear regression to fit this data. We use bokeh
here to visualize the data and the fitted model. Besides training, we also
visualize the loss value over iteration.\n* Logstic regression, we use the
official `sample_binary_classification_data` of spark as the training data.
Besides training, we also visualize the [...]
"user": "anonymous",
"dateUpdated": "2020-03-11 14:08:34.165",
"config": {