GitHub user AhyoungRyu reopened a pull request:
https://github.com/apache/zeppelin/pull/1339
[ZEPPELIN-1332] Remove spark-dependencies & suggest new way
### What is this PR for?
Currently, Zeppelin's embedded Spark is located under `interpreter/spark/`.
For whom **builds Zeppelin from source**, this Spark is downloaded when
they build the source with [build
profiles](https://github.com/apache/zeppelin#spark-interpreter). I think this
various build profiles are useful to customize the embedded Spark, but many
Spark users are using their own Spark not Zeppelin's embedded one. Nowadays
only Spark&Zeppelin beginners use this embedded Spark. For them, there are too
many build profiles(it's so complicated i think).
In case of **Zeppelin binary package**, it's included by default under
`interpreter/spark/`. That's why Zeppelin package size is so huge.
#### New suggestions
This PR will change the embedded Spark binary downloading mechanism like
below.
1. `./bin/zeppelin-daemon.sh get-spark` or `./bin/zeppelin.sh get-spark`
2. create `ZEPPELIN_HOME/local-spark/` and will download
`spark-2.0.1-hadoop2.7.bin.tgz` and untar
3. we can use this local spark without any configuration like before (e.g.
setting `SPARK_HOME`)
### What type of PR is it?
Improvement
### Todos
- [x] - trap `ctrl+c` & `ctrl+z` key interruption during downloading Spark
- [x] - test in the different OS
- [x] - update related document pages again after get feedbacks
### What is the Jira issue?
[ZEPPELIN-1332](https://issues.apache.org/jira/browse/ZEPPELIN-1332)
### How should this be tested?
1. `rm -r spark-dependencies`
2. Apply this patch and build with `mvn clean package -DskipTests`
3. try`bin/zeppelin-daemon.sh get-spark` or `bin/zeppelin.sh get-spark`
### Screenshots (if appropriate)
- `./bin/zeppelin-daemon.sh get-spark`
```
$ ./bin/zeppelin-daemon.sh get-spark
Download spark-2.0.1-bin-hadoop2.7.tgz from mirror ...
% Total % Received % Xferd Average Speed Time Time Time
Current
Dload Upload Total Spent Left
Speed
100 178M 100 178M 0 0 10.4M 0 0:00:17 0:00:17 --:--:--
10.2M
spark-2.0.1-bin-hadoop2.7 is successfully downloaded and saved under
/Users/ahyoungryu/Dev/zeppelin-development/zeppelin/local-spark
```
- if `ZEPPELIN_HOME/local-spark/spark-2.0.1-hadoop2.7` already exists
```
$ ./bin/zeppelin-daemon.sh get-spark
spark-2.0.1-bin-hadoop2.7 already exists under local-spark.
```
### Questions:
- Does the licenses files need update? no
- Is there breaking changes for older versions? no
- Does this needs documentation? Need to update some related documents
(e.g. README.md, spark.md and install.md ?)
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/AhyoungRyu/zeppelin ZEPPELIN-1332
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/zeppelin/pull/1339.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #1339
----
commit cf91a45420ea3047522998238beba274db9a5fca
Author: AhyoungRyu <[email protected]>
Date: 2016-08-16T15:08:19Z
Fix typo comment in interpreter.sh
commit 6c08f5207cb2286b6072b3dcd5cc882b4dbca39b
Author: AhyoungRyu <[email protected]>
Date: 2016-08-17T01:52:06Z
Remove spark-dependencies
commit a36702f8b35d7ee0d269190fe42ac8a2ff5d5b6e
Author: AhyoungRyu <[email protected]>
Date: 2016-08-17T07:14:35Z
Add spark-2.*-bin-hadoop* to .gitignore
commit 31b04f58491502d5b4ea7c1800f7606013a8ae74
Author: AhyoungRyu <[email protected]>
Date: 2016-08-17T15:22:25Z
Add download-spark.sh file
commit fd87a09d83000c94ced1b04b4254de9b35e4ccc5
Author: AhyoungRyu <[email protected]>
Date: 2016-08-17T15:28:51Z
Remove useless comment line in common.sh
commit e0fc280de061f7ee06603d5bc9ab41b5219a749d
Author: AhyoungRyu <[email protected]>
Date: 2016-08-18T03:32:11Z
Remove zeppelin-spark-dependencies from r/pom.xml
commit bf06931b988aee4d9dfc3c173cac18a740666e36
Author: AhyoungRyu <[email protected]>
Date: 2016-08-21T05:38:55Z
Change SPARK_HOME with proper message
commit dceb74fff19eac2071eed0d661c0571eceeada54
Author: AhyoungRyu <[email protected]>
Date: 2016-09-06T08:55:20Z
Check interpreter/spark/ instead of SPARK_HOME
commit e2a078ab87deba8cdf4a99f6c3642e3d4b41f3d8
Author: AhyoungRyu <[email protected]>
Date: 2016-09-06T08:55:40Z
Refactor download-spark.sh
commit 3c792d07c6d6b55896ae5b0e3e2b0d08f70fafb1
Author: AhyoungRyu <[email protected]>
Date: 2016-09-07T07:48:15Z
Revert: remove spark-dependencies
commit 1071566f442b9cf01c7145fd9dcbb48eb343f81a
Author: AhyoungRyu <[email protected]>
Date: 2016-09-07T13:23:11Z
Remove useless ZEPPELIN_HOME
commit 0c7e1b73299634f9fc5c579c54d4cff49449f910
Author: AhyoungRyu <[email protected]>
Date: 2016-09-08T05:51:40Z
Change dir of Spark bin to 'local-spark'
commit 787cec50ce796ddae9a1302e1ce376b2f3e5c5be
Author: AhyoungRyu <[email protected]>
Date: 2016-09-08T06:07:20Z
Set timeout for travis test
commit b5fc541a96d17db513dcd7d5c1ec5671e85733f0
Author: AhyoungRyu <[email protected]>
Date: 2016-09-08T06:16:54Z
Add license header to download-spark.cmd
commit c4d39f1df4dfeed1ad8544fe75621dc1aac693da
Author: AhyoungRyu <[email protected]>
Date: 2016-09-08T11:48:43Z
Fix wrong check condition in common.sh
commit 5c631477133253506490744abb54a0582a066f6c
Author: AhyoungRyu <[email protected]>
Date: 2016-09-08T13:14:29Z
Add travis condition to download-spark.sh
commit e91e7f83da0c44d91f53ae94a9d2f7f8117b86ae
Author: AhyoungRyu <[email protected]>
Date: 2016-09-12T05:42:29Z
Remove bin/download-spark.cmd again
commit f40fd2f13071647e812ac54278b1fbff87b808e7
Author: AhyoungRyu <[email protected]>
Date: 2016-09-12T16:25:31Z
Remove spark-dependency profiles & reorganize some titles in README.md
commit 31ebd191203139ee6f6bd794375c64c4f66cd28a
Author: AhyoungRyu <[email protected]>
Date: 2016-09-12T18:30:41Z
Update spark.md to add a guide for local-spark mode
commit 803f21cbfff07deaa7cd5d5be1e423b4db4802c7
Author: AhyoungRyu <[email protected]>
Date: 2016-09-12T18:49:49Z
Remove '-Ppyspark' build options
commit d5882554562e0244cb063186630b9e952fdf1c1c
Author: AhyoungRyu <[email protected]>
Date: 2016-09-13T08:09:18Z
Remove useless creating .bak file process
commit b7a91453255cced389bc639e27dc8b2232afd19f
Author: AhyoungRyu <[email protected]>
Date: 2016-09-13T11:21:10Z
Update install.md & spark.md
commit 63f29e91c3bd22df44aae91929468ea6a9516474
Author: AhyoungRyu <[email protected]>
Date: 2016-09-14T09:35:37Z
Resolve 'sed' command issue between OSX & Linux
commit 6e329a7b832cf9e526b72aa5e3eb32ab697ebfd7
Author: AhyoungRyu <[email protected]>
Date: 2016-09-14T11:20:31Z
Trap ctrl+c during downloading Spark
commit 1205f2d67f8116353f30128b367cebe2d35fd344
Author: AhyoungRyu <[email protected]>
Date: 2016-09-14T11:26:56Z
Remove useless condition
commit ff069af7023132ea4478ad38ea1164176b753ab9
Author: AhyoungRyu <[email protected]>
Date: 2016-09-20T17:05:16Z
Make local spark mode with zero-configuration as @moon suggested
commit c818cf766a60ef432d5310a664aeded7d9a58ab3
Author: AhyoungRyu <[email protected]>
Date: 2016-09-22T06:47:05Z
Put 'autodetect HADOOP_CONF_HOME by heuristic' back code blocks
commit b2dca36e25b03ac56a9ee221c1dff2d1ed105c95
Author: AhyoungRyu <[email protected]>
Date: 2016-09-22T14:20:31Z
Modify SparkRInterpreter.java to enable SparkR without SPARK_HOME
commit 310d607564e156b85a687e37c2c1d14d00ad1348
Author: AhyoungRyu <[email protected]>
Date: 2016-09-22T17:01:40Z
Remove duplicated variable declaration
commit 1ee4325aea1f761765018e15396860e9ca2bc538
Author: AhyoungRyu <[email protected]>
Date: 2016-09-22T17:02:01Z
Update related docs again
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---