Paul Jones created SPARK-20658:
----------------------------------

             Summary: spark.yarn.am.attemptFailuresValidityInterval doesn't 
seem to have an effect
                 Key: SPARK-20658
                 URL: https://issues.apache.org/jira/browse/SPARK-20658
             Project: Spark
          Issue Type: Bug
          Components: YARN
    Affects Versions: 2.1.0
            Reporter: Paul Jones
            Priority: Minor


I'm running a job in YARN cluster mode using 
`spark.yarn.am.attemptFailuresValidityInterval=1h` specified in both 
spark-default.conf and in my spark-submit command. (This flag shows up in the 
environment tab of spark history server, so it seems that it's specified 
correctly). 

However, I just had a job die with with four AM failures (three of the four 
failures were over an hour apart). So, I'm confused as to what could be going 
on. I haven't figured out the cause of the individual failures, so is it 
possible that we always count certain types of failures? E.g. jobs that are 
killed due to memory issues always count? 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to