GitHub user AhyoungRyu reopened a pull request:
https://github.com/apache/zeppelin/pull/1339
[ZEPPELIN-1332] Remove spark-dependencies & suggest new way
### What is this PR for?
Currently, Zeppelin's embedded Spark is located under `interpreter/spark/`.
For whom **builds Zeppelin from source**, this Spark is downloaded when
they build the source with [build
profiles](https://github.com/apache/zeppelin#spark-interpreter). I think this
various build profiles are useful to customize the embedded Spark, but many
Spark users are using their own Spark not Zeppelin's embedded one. Nowadays
only Spark&Zeppelin beginners use this embedded Spark. For them, there are too
many build profiles(it's so complicated i think).
In case of **Zeppelin binary package**, it's included by default under
`interpreter/spark/`. That's why Zeppelin package size is so huge.
#### New suggestions
This PR will change the embedded Spark binary downloading mechanism like
below.
1. `./bin/zeppelin-daemon.sh get-spark` or `./bin/zeppelin.sh get-spark`
2. create `ZEPPELIN_HOME/local-spark/` and will download
`spark-2.0.1-hadoop2.7.bin.tgz` and untar
3. we can use this local spark without any configuration like before (e.g.
setting `SPARK_HOME`)
### What type of PR is it?
Improvement
### Todos
- [x] - trap `ctrl+c` & `ctrl+z` key interruption during downloading Spark
- [x] - test in the different OS
- [x] - update related document pages again after get feedbacks
### What is the Jira issue?
[ZEPPELIN-1332](https://issues.apache.org/jira/browse/ZEPPELIN-1332)
### How should this be tested?
1. `rm -r spark-dependencies`
2. Apply this patch and build with `mvn clean package -DskipTests`
3. try`bin/zeppelin-daemon.sh get-spark` or `bin/zeppelin.sh get-spark`
4. should be able to run `sc.version` without setting external `SPARK_HOME`
### Screenshots (if appropriate)
- `./bin/zeppelin-daemon.sh get-spark`
```
$ ./bin/zeppelin-daemon.sh get-spark
Download spark-2.0.1-bin-hadoop2.7.tgz from mirror ...
% Total % Received % Xferd Average Speed Time Time Time
Current
Dload Upload Total Spent Left
Speed
100 178M 100 178M 0 0 10.4M 0 0:00:17 0:00:17 --:--:--
10.2M
spark-2.0.1-bin-hadoop2.7 is successfully downloaded and saved under
/Users/ahyoungryu/Dev/zeppelin-development/zeppelin/local-spark
```
- if `ZEPPELIN_HOME/local-spark/spark-2.0.1-hadoop2.7` already exists
```
$ ./bin/zeppelin-daemon.sh get-spark
spark-2.0.1-bin-hadoop2.7 already exists under local-spark.
```
### Questions:
- Does the licenses files need update? no
- Is there breaking changes for older versions? no
- Does this needs documentation? Need to update some related documents
(e.g. README.md, spark.md and install.md ?)
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/AhyoungRyu/zeppelin ZEPPELIN-1332
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/zeppelin/pull/1339.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #1339
----
commit d377cc6f28dd6cae43364f61135ed8abcba3b269
Author: AhyoungRyu <[email protected]>
Date: 2016-08-16T15:08:19Z
Fix typo comment in interpreter.sh
commit 4f3edfd87e84e65789e0e937b5330c16442fcfbe
Author: AhyoungRyu <[email protected]>
Date: 2016-08-17T01:52:06Z
Remove spark-dependencies
commit 99ef019521ca1fd0fc41958b20da8642773825d5
Author: AhyoungRyu <[email protected]>
Date: 2016-08-17T07:14:35Z
Add spark-2.*-bin-hadoop* to .gitignore
commit 4e8d5ff067c5428a5254e45b4de533c56393f7b4
Author: AhyoungRyu <[email protected]>
Date: 2016-08-17T15:22:25Z
Add download-spark.sh file
commit 6784015b8da439894dd09bbc3e54477a0f3cba84
Author: AhyoungRyu <[email protected]>
Date: 2016-08-17T15:28:51Z
Remove useless comment line in common.sh
commit c866f0b231432b14c092a365d270e81a2222f54a
Author: AhyoungRyu <[email protected]>
Date: 2016-08-18T03:32:11Z
Remove zeppelin-spark-dependencies from r/pom.xml
commit 3fe19bff1bdbdccba63e3163bd7aabfe23a35777
Author: AhyoungRyu <[email protected]>
Date: 2016-08-21T05:38:55Z
Change SPARK_HOME with proper message
commit 99545233c0e84f48fbf98da25ad131eeba6dd293
Author: AhyoungRyu <[email protected]>
Date: 2016-09-06T08:55:20Z
Check interpreter/spark/ instead of SPARK_HOME
commit e6973b3887e9c0d50a1168f26e6f0337f9f78986
Author: AhyoungRyu <[email protected]>
Date: 2016-09-06T08:55:40Z
Refactor download-spark.sh
commit 552185ac03f1b5edc9fabb4d381d471c59078903
Author: AhyoungRyu <[email protected]>
Date: 2016-09-07T07:48:15Z
Revert: remove spark-dependencies
commit ffe64d9b264ab3db67d28a045e34c9c4d471058a
Author: AhyoungRyu <[email protected]>
Date: 2016-09-07T13:23:11Z
Remove useless ZEPPELIN_HOME
commit 5ed33112d64dc3063a29d515d4987e193a909dd0
Author: AhyoungRyu <[email protected]>
Date: 2016-09-08T05:51:40Z
Change dir of Spark bin to 'local-spark'
commit 1419f0b8d76a8e15ac7646e3827dd536246038d1
Author: AhyoungRyu <[email protected]>
Date: 2016-09-08T06:07:20Z
Set timeout for travis test
commit a813d922ba29b5c392a908c3199050884266b969
Author: AhyoungRyu <[email protected]>
Date: 2016-09-08T06:16:54Z
Add license header to download-spark.cmd
commit 368c15aefd650a59c6fb0fdd040efe1bbb2618cc
Author: AhyoungRyu <[email protected]>
Date: 2016-09-08T11:48:43Z
Fix wrong check condition in common.sh
commit e58075d046f65ae173fecc31c0b648b87f445af4
Author: AhyoungRyu <[email protected]>
Date: 2016-09-08T13:14:29Z
Add travis condition to download-spark.sh
commit 89be91b049a646b1a0fc7dcfeb5e8bfde68bdab4
Author: AhyoungRyu <[email protected]>
Date: 2016-09-12T05:42:29Z
Remove bin/download-spark.cmd again
commit b22364ddba120842933e96eca1e082680cd5407a
Author: AhyoungRyu <[email protected]>
Date: 2016-09-12T16:25:31Z
Remove spark-dependency profiles & reorganize some titles in README.md
commit 24dc95faa39586be323365f21a2beb1f683becf8
Author: AhyoungRyu <[email protected]>
Date: 2016-09-12T18:30:41Z
Update spark.md to add a guide for local-spark mode
commit 2537fa14d5e13c34be9eeab932bf5dc853bda5d4
Author: AhyoungRyu <[email protected]>
Date: 2016-09-12T18:49:49Z
Remove '-Ppyspark' build options
commit ca534e596c36ced04f832b0a7ab7e78e951929e1
Author: AhyoungRyu <[email protected]>
Date: 2016-09-13T08:09:18Z
Remove useless creating .bak file process
commit edd525d0f6eac0a956bc64f58e77ac3afc423f58
Author: AhyoungRyu <[email protected]>
Date: 2016-09-13T11:21:10Z
Update install.md & spark.md
commit a9b110a809463ac1795e76a30b9cd2df6c40292d
Author: AhyoungRyu <[email protected]>
Date: 2016-09-14T09:35:37Z
Resolve 'sed' command issue between OSX & Linux
commit f383d3afb8f9e2c1e240f69d8d970c469d0a9ced
Author: AhyoungRyu <[email protected]>
Date: 2016-09-14T11:20:31Z
Trap ctrl+c during downloading Spark
commit 527ef5b6518d3477d9731422cad190a59df11d1e
Author: AhyoungRyu <[email protected]>
Date: 2016-09-14T11:26:56Z
Remove useless condition
commit 555372a655b788b3b0fdd85d430b6f063ce13834
Author: AhyoungRyu <[email protected]>
Date: 2016-09-20T17:05:16Z
Make local spark mode with zero-configuration as @moon suggested
commit de87cb2adf5ad510a712e4f696ae127c7a414077
Author: AhyoungRyu <[email protected]>
Date: 2016-09-22T14:20:31Z
Modify SparkRInterpreter.java to enable SparkR without SPARK_HOME
commit 1dd51d8e1dcb8d65e22a1cc67a5d089c5d7c196b
Author: AhyoungRyu <[email protected]>
Date: 2016-09-22T17:01:40Z
Remove duplicated variable declaration
commit f068bef554507e7125865f77816986d5b085a7b3
Author: AhyoungRyu <[email protected]>
Date: 2016-09-22T17:02:01Z
Update related docs again
commit 437f2063a39d2a7a583bb647cb885e51a0990098
Author: AhyoungRyu <[email protected]>
Date: 2016-09-23T05:37:57Z
Fix typo in SparkRInterpreter.java
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---