Re: AM creation in yarn client mode

Steve Loughran Wed, 10 Feb 2016 07:04:29 -0800

On 10 Feb 2016, at 14:18, Manoj Awasthi 
<awasthi.ma...@gmail.com<mailto:awasthi.ma...@gmail.com>> wrote:

My pardon to writing that "there is no AM". I realize it! :-) :-)

There is the unmanaged AM option, which was originally written for debugging,
but has been used in various apps.

Spark doesn't do it; you'd host the unmanaged AM in the client app, alongside
the driver. it'd talk the AMRM protocol to the YARN resource manager for
containers and things.

I don't think you'd gain much; you'd still be vulnerable to failure of
client/loss of connectivity. All it'd be doing is adding a third deployment
mode to test and maintain

On Wed, Feb 10, 2016 at 7:14 PM, Steve Loughran
<ste...@hortonworks.com<mailto:ste...@hortonworks.com>> wrote:

On 10 Feb 2016, at 13:20, Manoj Awasthi
<awasthi.ma...@gmail.com<mailto:awasthi.ma...@gmail.com>> wrote:

On Wed, Feb 10, 2016 at 5:20 PM, Steve Loughran
<ste...@hortonworks.com<mailto:ste...@hortonworks.com>> wrote:

On 10 Feb 2016, at 04:42, praveen S
<mylogi...@gmail.com<mailto:mylogi...@gmail.com>> wrote:

Hi,

I have 2 questions when running the spark jobs on yarn in client mode :

1) Where is the AM(application master) created :

in the cluster

A) is it created on the client where the job was submitted? i.e driver and AM
on the same client?

Or B) yarn decides where the the AM should be created?

yes

2) Driver and AM run in different processes : is my assumption correct?

yes. the driver runs on your local system, which had better be close to the
cluster and stay up for the duration of the work

This is not correct. In yarn-cluster mode driver is what runs inside the
application master and the node on which application master gets allocated is
decided by yarn.

I agree

In yarn-client mode, there is no application master and driver runs in the
context of the same unix process as the spark-submit.

I must beg to differ

There is an AM. All YARN apps need an AM, as it is the only way you can get
containers to run your work. And, except in the special case of an "Unmanaged
AM", the AM runs in the cluster.

In spark, an AM is always set up by the YARN client and submitted to the cluster

https://github.com/apache/spark/blob/4f60651cbec1b4c9cc2e6d832ace77e89a233f3a/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala#L116

The big difference is where that driver lives. In --cluster, its in the AM. in
--client, it's in the client

here's the AM making its decision

https://github.com/apache/spark/blob/4f60651cbec1b4c9cc2e6d832ace77e89a233f3a/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala#L187

-Steve

(why yes, I have spent too much time staring at AM logs)

Re: AM creation in yarn client mode

Reply via email to