digger commented on issue #6371: [AIRFLOW-5691] Rewrite Dataproc operators to use python library URL: https://github.com/apache/airflow/pull/6371#issuecomment-586502594 @dossett, the functionality added in AIRFLOW-3211 actually broke the behavior of the dataproc hook and made a few 1.10.x releases unusable for dataproc users. The problem is that the hook only uses the task ID part of the dataproc job ID when looking for previous invocations of the job, so if dataproc history still has jobs corresponding to any of the previous dag runs, the dataproc hook doesn't execute the job. A proper way to implement this would be to associate dataproc jobs with particular dag runs by e.g. embedding a dag run id hash in the dataproc job id. In any case the functionality added in AIRFLOW-3211 has to be optional. In our experience, users expect dataproc to be re-executed when they reexecute the task, and this new behavior creates a lot of confusion.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
