[ 
https://issues.apache.org/jira/browse/SPARK-17313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15451785#comment-15451785
 ] 

Sameh El-Ansary commented on SPARK-17313:
-----------------------------------------

1- WHY? (The driver is not doing much work)
True but in many development/education situations, a cloud-deployed cluster is 
shared between students/developers, on shared large data sets with a large 
number of executors. Running the driver on the development machine (even for 
exploratory purposes) can be problematic due to memory issues on the 
workstation or many yarn executors talking to one driver on a remote connection.

2- HOW?: about interacting with local machine shell when the driver in the 
cluster
Short answer: Through Livy or similar
https://github.com/cloudera/livy

More details here:
Dummy REPL+RestClient  ———> REST——>  RestServer+Driver 
   (workstation)                                                        
(cluster node) 

Since yarn allocates nodes dynamically on the cluster, and usually the 
workstation machine is not open to all cluster nodes, one needs a Proxy server 
that would run typically on the Yarn application master. Thus the architecture 
would be:

Dummy REPL+RestClient  ———> REST ——> Proxy Sever ----------REST-->  
RestServer+Driver 
   (workstation)                                                  (App Master 
Node)                (cluster node) 

Livy provides the RestClient, Proxy Server and RestServer+Driver. Adding a Livy 
RestClient to spark-shell would make it capable of operating in yarn-cluster 
mode.

3- Zeppelin or spark-shell
Using Zeppelin or Spark-shell for exploratory work is a matter of taste IMHO. 
The simplicity of the shell and the power of Zeppelin, each, are needed in 
different times as per case at hand. 

4- Yarn-cluster on Zeppelin
True, yarn-cluster has not been supported in Zeppelin for a long time, but it 
has  recently been added using Livy 
(https://github.com/apache/zeppelin/pull/827),  where the submitter of the 
issue has contributed a bit. 
Similarly, it would be good to be support that from the spark-shell.

> Support spark-shell on cluster mode
> -----------------------------------
>
>                 Key: SPARK-17313
>                 URL: https://issues.apache.org/jira/browse/SPARK-17313
>             Project: Spark
>          Issue Type: New Feature
>            Reporter: Mahmoud Elgamal
>
> The main issue with the current spark shell is that the driver is running on 
> the user machine. If the driver resource requirement is beyond user machine 
> capacity, then spark shell will be useless. If we are to add the cluster 
> mode(Yarn or Mesos ) for spark shell via some sort of proxy where user 
> machine only hosts a rest client to the running driver at the cluster, the 
> shell will be more powerful



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to