[GitHub] [airflow] mniehoff commented on issue #15588: Task is retried after Scheduler restart

GitBox Mon, 17 May 2021 11:33:57 -0700


mniehoff commented on issue #15588:
URL: https://github.com/apache/airflow/issues/15588#issuecomment-842281159



   I dug a bit deeper into it and the problem seems to be within the databricks 
operator (tbh: was expecting that is was rather the op than the scheduler).
   
   In comparison to e.g. BigQuery Databricks job runs do not have an id that 
you can set from the outside. it's getting generated as soon as a new job run 
is triggered. 
   The BigQueryJobOperator creates the job run id by itself and if the operator 
is restarted, due to a scheduler restart it checks if a job with this id is 
already running and if yes, "attaches" to the running job.
   
   For Databricks this is not possible, as the job id is not configurable.
   A few options I see to mitigate this:
   
   1) save the job run ID somewhere (not sure where), where it survives the 
scheduler restart and can be picked up by the operator. So the op could 
reattach to the job run. the run id would be deleted when the job has finished
   
   2) one could always reattach if there is a job run for a given job. this 
would work in my case, but would not work in general, as databricks allows 
concurrent runs for a job. and there will definitely be a case where a job run 
exists and a new run should be triggered.
   
   3) currently the operator polls. one could give the operator an async=True 
flag, which exits the operator after the databricks job has been started and 
then use a sensor to poll for the job run status (the job run is available 
using xcom already).
   
   Imho only 1 and 3 are feasible solutions. but I am not sure where to store 
the job run id, so that it survives the scheduler restart. 3) is imho the 
cleanest solution.
   
   Let me know what you think. I definitely aim to contribute these changes 
back to provider packages.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [airflow] mniehoff commented on issue #15588: Task is retried after Scheduler restart

Reply via email to