Yes you are right. The job failed and it was re-attempting.

Thank you,


From: Daniel Siegmann 
<daniel.siegm...@teamaol.com<mailto:daniel.siegm...@teamaol.com>>
Date: Monday, 21 March 2016 21:33
To: Ted Yu <yuzhih...@gmail.com<mailto:yuzhih...@gmail.com>>
Cc: Roberto Pagliari 
<roberto.pagli...@asos.com<mailto:roberto.pagli...@asos.com>>, 
"user@spark.apache.org<mailto:user@spark.apache.org>" 
<user@spark.apache.org<mailto:user@spark.apache.org>>
Subject: Re: cluster randomly re-starting jobs

Never used Ambari and I don't know if this is your problem, but I have seen 
similar behavior. In my case, my application failed and Hadoop kicked off a 
second attempt. I didn't realize this, but when I refreshed the Spark UI, 
suddenly everything seemed reset! This is because the application ID is part of 
the URL, but not the attempt ID, so when the context for the second attempt 
starts it will be at the same URL as the context for the first job.

To verify if this is the problem you could look at the application in the 
Hadoop console (or whatever the equivalent is on Ambari) and see if there are 
multiple attempts. You can also see it in the Spark history server (under 
incomplete applications, if the second attempt is still running).

~Daniel Siegmann

On Mon, Mar 21, 2016 at 9:58 AM, Ted Yu 
<yuzhih...@gmail.com<mailto:yuzhih...@gmail.com>> wrote:
Can you provide a bit more information ?

Release of Spark and YARN

Have you checked Spark UI / YARN job log to see if there is some clue ?

Cheers

On Mon, Mar 21, 2016 at 6:21 AM, Roberto Pagliari 
<roberto.pagli...@asos.com<mailto:roberto.pagli...@asos.com>> wrote:
I noticed that sometimes the spark cluster seems to restart the job completely.

In the Ambari UI (where I can check jobs/stages) everything that was done up to 
a certain point is removed, and the job is restarted.

Does anyone know what the issue could be?

Thank you,



Reply via email to