mdnawed2010 opened a new issue #8806: URL: https://github.com/apache/airflow/issues/8806
**Apache Airflow version**: 1.10.6 (looks like this problem is for all ver >=1.10.4) **Environment**: Google cloud composer(composer-1.10.2-airflow-1.10.6) Issue : This is regarding DataProcSparkOperator. As part of https://issues.apache.org/jira/browse/AIRFLOW-3211, fix for reattaching the previous instance of data proc job was introduced so that when the DAG restarts/re-triggered it doesn't end up re-running the data proc task which may be completed already or in running state at that time because of previous dag run. I am adding the below comment from the original JIRA itself, as this comment perfectly explains the issue and nobody has reverted there ==> The functionality added by this story actually broke the behavior of the dataproc hook and made a few 1.10.x releases unusable for dataproc users. The problem is that the hook only uses the task ID part of the dataproc job ID when looking for previous invocations of the job, so if dataproc history still has jobs corresponding to any of the previous dag runs, the dataproc hook doesn't execute the job. A proper way to implement this would be to associate dataproc jobs with particular dag runs by e.g. embedding a dag run id hash in the dataproc job id. In any case this functionality has to be optional. In our experience, users expect dataproc jobs to be re-executed when they re-execute the task, and this new behavior creates a lot of confusion. Old issue link : https://issues.apache.org/jira/browse/AIRFLOW-3211 ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
