[
https://issues.apache.org/jira/browse/SPARK-17313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15451785#comment-15451785
]
Sameh El-Ansary commented on SPARK-17313:
-----------------------------------------
1- WHY? (The driver is not doing much work)
True but in many development/education situations, a cloud-deployed cluster is
shared between students/developers, on shared large data sets with a large
number of executors. Running the driver on the development machine (even for
exploratory purposes) can be problematic due to memory issues on the
workstation or many yarn executors talking to one driver on a remote connection.
2- HOW?: about interacting with local machine shell when the driver in the
cluster
Short answer: Through Livy or similar
https://github.com/cloudera/livy
More details here:
Dummy REPL+RestClient ———> REST——> RestServer+Driver
(workstation)
(cluster node)
Since yarn allocates nodes dynamically on the cluster, and usually the
workstation machine is not open to all cluster nodes, one needs a Proxy server
that would run typically on the Yarn application master. Thus the architecture
would be:
Dummy REPL+RestClient ———> REST ——> Proxy Sever ----------REST-->
RestServer+Driver
(workstation) (App Master
Node) (cluster node)
Livy provides the RestClient, Proxy Server and RestServer+Driver. Adding a Livy
RestClient to spark-shell would make it capable of operating in yarn-cluster
mode.
3- Zeppelin or spark-shell
Using Zeppelin or Spark-shell for exploratory work is a matter of taste IMHO.
The simplicity of the shell and the power of Zeppelin, each, are needed in
different times as per case at hand.
4- Yarn-cluster on Zeppelin
True, yarn-cluster has not been supported in Zeppelin for a long time, but it
has recently been added using Livy
(https://github.com/apache/zeppelin/pull/827), where the submitter of the
issue has contributed a bit.
Similarly, it would be good to be support that from the spark-shell.
> Support spark-shell on cluster mode
> -----------------------------------
>
> Key: SPARK-17313
> URL: https://issues.apache.org/jira/browse/SPARK-17313
> Project: Spark
> Issue Type: New Feature
> Reporter: Mahmoud Elgamal
>
> The main issue with the current spark shell is that the driver is running on
> the user machine. If the driver resource requirement is beyond user machine
> capacity, then spark shell will be useless. If we are to add the cluster
> mode(Yarn or Mesos ) for spark shell via some sort of proxy where user
> machine only hosts a rest client to the running driver at the cluster, the
> shell will be more powerful
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]