Re: Zeppelin - Spark Driver location

Vannson, Raphael Tue, 13 Mar 2018 18:05:42 -0700

Hello Jhon,

Conceptually this makes sense, since Zeppelin creates a spark application for 
the execution runtime underneath its frontend process.


Having said this, depending on how Zeppelin is implemented, it might be 
required for the driver to be collocated with the zeppelin process on the same 
host (remember the Zeppelin notebook process needs to “talk” to the spark 
driver process, this might be done via a child process).
I certainly can see how a collocated design would be simpler to implement for 
the Zeppelin contributors which may have considered the functionality you have 
described but for a later release date.

So this is not a definitive answer (I don’t know the actual answer) but I would 
not expect this kind of setup to be supported yet.
(I tried to make it work and could not get the spark kernel to start so I just 
reverted to a client deploy mode instead of cluster – since this option was 
acceptable to me).
I would be curious to see if that is possible tough and how that would be 
configured.

I hope this helps (a bit).

Best,
Raphael

From: Jhon Anderson Cardenas Diaz <jhonderson2...@gmail.com>
Reply-To: "us...@zeppelin.apache.org" <us...@zeppelin.apache.org>
Date: Tuesday, March 13, 2018 at 4:24 PM
To: "dev@zeppelin.apache.org" <dev@zeppelin.apache.org>, 
"us...@zeppelin.apache.org" <us...@zeppelin.apache.org>
Subject: Zeppelin - Spark Driver location

Hi zeppelin users !

I am working with zeppelin pointing to a spark in standalone. I am trying to 
figure out a way to make zeppelin runs the spark driver outside of client 
process that submits the application.

According with the documentation 
(http://spark.apache.org/docs/2.1.1/spark-standalone.html):

For standalone clusters, Spark currently supports two deploy modes. In client 
mode, the driver is launched in the same process as the client that submits the 
application. In cluster mode, however, the driver is launched from one of the 
Worker processes inside the cluster, and the client process exits as soon as it 
fulfills its responsibility of submitting the application without waiting for 
the application to finish.

The problem is that, even when I set the properties for spark-standalone 
cluster and deploy mode in cluster, the driver still run inside zeppelin 
machine (according with spark UI/executors page). These are properties that I 
am setting for the spark interpreter:

master: spark://<master-name>:7077
spark.submit.deployMode: cluster
spark.executor.memory: 16g

Any ideas would be appreciated.

Thank you

Details:
Spark version: 2.1.1
Zeppelin version: 0.8.0 (merged at September 2017 version)

Re: Zeppelin - Spark Driver location

Reply via email to