Paul Jones created SPARK-20658:
----------------------------------
Summary: spark.yarn.am.attemptFailuresValidityInterval doesn't
seem to have an effect
Key: SPARK-20658
URL: https://issues.apache.org/jira/browse/SPARK-20658
Project: Spark
Issue Type: Bug
Components: YARN
Affects Versions: 2.1.0
Reporter: Paul Jones
Priority: Minor
I'm running a job in YARN cluster mode using
`spark.yarn.am.attemptFailuresValidityInterval=1h` specified in both
spark-default.conf and in my spark-submit command. (This flag shows up in the
environment tab of spark history server, so it seems that it's specified
correctly).
However, I just had a job die with with four AM failures (three of the four
failures were over an hour apart). So, I'm confused as to what could be going
on. I haven't figured out the cause of the individual failures, so is it
possible that we always count certain types of failures? E.g. jobs that are
killed due to memory issues always count?
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]