[
https://issues.apache.org/jira/browse/AIRFLOW-5456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16995085#comment-16995085
]
Mike Prior commented on AIRFLOW-5456:
-------------------------------------
Hi Sebastian,
I see the same issue. To workaround it, I made the change below to
'spark_submit_hook.py'. If the spark job fails, the task will fail. I'm not
sure what the additional kubernetes checking is supposed to do.
Mike
# Check spark-submit return code. In Kubernetes mode, also check the
value
# of exit code in the log, as it may differ.
### if returncode or (self._is_kubernetes and self._spark_exit_code !=
0):
*if returncode != 0:*
raise AirflowException(
"Cannot execute: {}. Error code is: {}.".format(
spark_submit_cmd, returncode
)
)
> Mark spark submit operator task as 'failed' when kubernetes pod never ran
> -------------------------------------------------------------------------
>
> Key: AIRFLOW-5456
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5456
> Project: Apache Airflow
> Issue Type: Bug
> Components: operators
> Affects Versions: 1.10.0, 1.10.1, 1.10.2, 1.10.3, 1.10.4, 1.10.5
> Reporter: Sebastian Arzt
> Priority: Minor
> Labels: failure-handling, operator, spark, spark-submit
>
> Currently spark submit operator task will not fail if the corresponding pod
> never entered phase 'Running'.
> Background: we observed spark submit operator tasks marked as "success"
> although the spark job was never running on kubernetes.
> Logs (truncated):
> {code:java}
> [2019-09-11 09:21:02,732] {logging_mixin.py:95} INFO - [2019-09-11
> 09:21:02,732] {spark_submit_hook.py:427} INFO - 2019-09-11 09:21:02 INFO
> LoggingPodStatusWatcherImpl:54 - State changed, new state:
> [2019-09-11 09:21:02,732] {logging_mixin.py:95} INFO - [2019-09-11
> 09:21:02,732] {spark_submit_hook.py:410} INFO - Identified spark driver pod:
> pod-name
> [2019-09-11 09:21:02,733] {logging_mixin.py:95} INFO - [2019-09-11
> 09:21:02,732] {spark_submit_hook.py:427} INFO - pod name: pod-name
> [2019-09-11 09:21:02,733] {logging_mixin.py:95} INFO - [2019-09-11
> 09:21:02,733] {spark_submit_hook.py:427} INFO - namespace: default
> [2019-09-11 09:21:02,733] {logging_mixin.py:95} INFO - [2019-09-11
> 09:21:02,733] {spark_submit_hook.py:427} INFO - pod uid:
> 797f3157-d475-11e9-9758-1209ef52ae5e
> [2019-09-11 09:21:02,733] {logging_mixin.py:95} INFO - [2019-09-11
> 09:21:02,733] {spark_submit_hook.py:427} INFO - creation time:
> 2019-09-11T09:21:02Z
> [2019-09-11 09:21:02,733] {logging_mixin.py:95} INFO - [2019-09-11
> 09:21:02,733] {spark_submit_hook.py:427} INFO - service account name: account
> [2019-09-11 09:21:02,734] {logging_mixin.py:95} INFO - [2019-09-11
> 09:21:02,733] {spark_submit_hook.py:427} INFO - volumes: vol1, vol2
> [2019-09-11 09:21:02,734] {logging_mixin.py:95} INFO - [2019-09-11
> 09:21:02,734] {spark_submit_hook.py:427} INFO - node name: node name
> [2019-09-11 09:21:02,734] {logging_mixin.py:95} INFO - [2019-09-11
> 09:21:02,734] {spark_submit_hook.py:427} INFO - start time:
> 2019-09-11T09:21:02Z
> [2019-09-11 09:21:02,734] {logging_mixin.py:95} INFO - [2019-09-11
> 09:21:02,734] {spark_submit_hook.py:427} INFO - container images:
> some-image:tag
> [2019-09-11 09:21:02,734] {logging_mixin.py:95} INFO - [2019-09-11
> 09:21:02,734] {spark_submit_hook.py:427} INFO - phase: Pending
> [2019-09-11 09:27:56,813] {logging_mixin.py:95} INFO - [2019-09-11
> 09:27:56,813] {spark_submit_hook.py:427} INFO - 2019-09-11 09:27:56 INFO
> LoggingPodStatusWatcherImpl:54 - Container final statuses:
> [2019-09-11 09:27:56,813] {logging_mixin.py:95} INFO - [2019-09-11
> 09:27:56,813] {spark_submit_hook.py:427} INFO - Container name:
> spark-kubernetes-driver
> [2019-09-11 09:27:56,813] {logging_mixin.py:95} INFO - [2019-09-11
> 09:27:56,813] {spark_submit_hook.py:427} INFO - Container state: Terminated
> {code}
> Solution: Do not mark job as 'success' if phase 'Running' was never observed
> in the spark-submit logs.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)