TL;DR Is there any recommended way to lazily load input for Airflow operators?
I could not found a way to do this. While I faced this limitation while using the Databricks operator, it seems other operators might potentially lack such a functionality. Please, keep reading for more details. --- When instantiating a DatabricksSubmitRunOperator (https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/operators/databricks_operator.py) users need to pass the description of the job that will later be executed on Databricks. The job description is only needed at execution time (when the hook is called). However, the json parameter must already have the full job description when constructing the operator. This may present a problem if computing the job description needs to execute expensive operations (e.g., querying a database). The expensive operation will be invoked every single time the DAG is reprocessed (which may happen quite frequently). It would be good to have an equivalent mechanism to the python_callable parameter in the PythonOperator. In this way, users could pass a function that would generate the job description only when the operator is actually executed. I discussed this with Andrew Chen (from Databricks), and he agrees it would be an interesting feature to add. Does this sound reasonable? Is this use case supported in some way that I am unaware of? You can find the issue I created here: https://issues.apache.org/jira/projects/AIRFLOW/issues/AIRFLOW-2964
