[
https://issues.apache.org/jira/browse/AIRFLOW-3035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Kaxil Naik updated AIRFLOW-3035:
--------------------------------
Fix Version/s: (was: 2.0.0)
1.10.2
> gcp_dataproc_hook should treat CANCELLED job state consistently
> ---------------------------------------------------------------
>
> Key: AIRFLOW-3035
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3035
> Project: Apache Airflow
> Issue Type: Improvement
> Components: contrib
> Affects Versions: 1.10.0
> Reporter: Jeffrey Payne
> Assignee: Jeffrey Payne
> Priority: Minor
> Labels: dataproc
> Fix For: 1.10.2
>
>
> When a DP job is cancelled, {{gcp_dataproc_hook.py}} does not treat the
> {{CANCELLED}} state in a consistent and non-intuitive manner:
> # The API internal to {{gcp_dataproc_hook.py}} returns {{False}} from
> {{_DataProcJob.wait_for_done()}}, resulting in {{raise_error()}} being called
> for cancelled jobs, yet {{raise_error()}} only raises {{Exception}} if the
> job state is {{ERROR}}.
> # The end result from the perspective of the {{dataproc_operator.py}} for a
> cancelled job is that the job succeeded, which results in the success
> callback being called. This seems strange to me, as a "cancelled" job is
> rarely considered successful, in my experience.
> Simply changing {{raise_error()}} from:
> {code:python}
> if 'ERROR' == self.job['status']['state']:
> {code}
> to
> {code:python}
> if self.job['status']['state'] in ('ERROR', 'CANCELLED'):
> {code}
> would fix both of these...
> Another, perhaps better, option would be to have the dataproc job operators
> accept a list of {{error_states}} that could be passed into
> {{raise_error()}}, allowing the caller to determine which states should
> result in "failure" of the task. I would lean towards that option.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)