digger commented on issue #6371: [AIRFLOW-5691] Rewrite Dataproc operators to 
use python library
URL: https://github.com/apache/airflow/pull/6371#issuecomment-586502594
 
 
   @dossett, the functionality added in AIRFLOW-3211 actually broke the 
behavior of the dataproc hook and made a few 1.10.x releases unusable for 
dataproc users. The problem is that the hook only uses the task ID part of the 
dataproc job ID when looking for previous invocations of the job, so if 
dataproc history still has jobs corresponding to any of the previous dag runs, 
the dataproc hook doesn't execute the job. A proper way to implement this would 
be to associate dataproc jobs with particular dag runs by e.g. embedding a dag 
run id hash in the dataproc job id. In any case the functionality added in  
AIRFLOW-3211 has to be optional. In our experience, users expect dataproc to be 
re-executed when they reexecute the task, and this new behavior creates a lot 
of confusion.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to