[
https://issues.apache.org/jira/browse/SPARK-20658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16001803#comment-16001803
]
Paul Jones commented on SPARK-20658:
------------------------------------
The jars are versioned 2.7.3.
Finally finished grepping through the logs. I didn't find that error message.
The closest I found was:
{noformat}
applications/hadoop-yarn/yarn-yarn-resourcemanager-ip-10-0-15-75.log.2017-04-28-03.gz:2017-04-28
03:37:33,051 INFO org.apache.hadoop.yar
n.server.resourcemanager.rmapp.RMAppImpl (IPC Server handler 34 on 8032): The
attemptFailuresValidityInterval for the application: application_1493122281436_0
016 is 3600000.
{noformat}
> spark.yarn.am.attemptFailuresValidityInterval doesn't seem to have an effect
> ----------------------------------------------------------------------------
>
> Key: SPARK-20658
> URL: https://issues.apache.org/jira/browse/SPARK-20658
> Project: Spark
> Issue Type: Bug
> Components: YARN
> Affects Versions: 2.1.0
> Reporter: Paul Jones
> Priority: Minor
>
> I'm running a job in YARN cluster mode using
> `spark.yarn.am.attemptFailuresValidityInterval=1h` specified in both
> spark-default.conf and in my spark-submit command. (This flag shows up in the
> environment tab of spark history server, so it seems that it's specified
> correctly).
> However, I just had a job die with with four AM failures (three of the four
> failures were over an hour apart). So, I'm confused as to what could be going
> on. I haven't figured out the cause of the individual failures, so is it
> possible that we always count certain types of failures? E.g. jobs that are
> killed due to memory issues always count?
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]