mdnawed2010 opened a new issue #8806:
URL: https://github.com/apache/airflow/issues/8806


   
   
   **Apache Airflow version**: 1.10.6 (looks like this problem is for all ver 
>=1.10.4)
   
   **Environment**: Google cloud composer(composer-1.10.2-airflow-1.10.6)
   
   Issue : This is regarding DataProcSparkOperator. As part of 
https://issues.apache.org/jira/browse/AIRFLOW-3211, fix for reattaching the 
previous instance of data proc job was introduced so that when the DAG 
restarts/re-triggered it doesn't end up re-running the data proc task which may 
be completed already or in running state at that time because of previous dag 
run.
   
   I am adding the below comment from the original JIRA itself, as this comment 
perfectly explains the issue and nobody has reverted there ==>
   
   
   The functionality added by this story actually broke the behavior of the 
dataproc hook and made a few 1.10.x releases unusable for dataproc users. The 
problem is that the hook only uses the task ID part of the dataproc job ID when 
looking for previous invocations of the job, so if dataproc history still has 
jobs corresponding to any of the previous dag runs, the dataproc hook doesn't 
execute the job.
   A proper way to implement this would be to associate dataproc jobs with 
particular dag runs by e.g. embedding a dag run id hash in the dataproc job id.
   In any case this functionality has to be optional. In our experience, users 
expect dataproc jobs to be re-executed when they re-execute the task, and this 
new behavior creates a lot of confusion.
   
   Old issue link :  https://issues.apache.org/jira/browse/AIRFLOW-3211
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to