GitHub user clement-micol created a discussion: Should 
`DatabricksSubmitRunOperator` fail at parsing time if JSON is not valid instead 
of execution time?

When adding a feature to one of my Airflow DAGs, I mistakenly passed a bad 
cluster configuration to the `new_cluster` keyword argument of 
`DatabricksSubmitRunOperator`. Our CI/CD tests that DAGs compile and parse, but 
doesn't execute them. Because `DatabricksSubmitRunOperator` only calls 
`normalise_json_content` inside `execute()` (not in `__init__`), we only caught 
the error once the change was deployed and broke our DAGs at runtime.

Looking at the code (as of provider version 7.6.0), there seems to be an 
inconsistency between the two operators:

- **`DatabricksRunNowOperator.__init__`** calls 
`normalise_json_content(self.json)` at construction time ([line 
822-823](https://github.com/apache/airflow/blob/main/airflow/providers/databricks/operators/databricks.py)),
 so invalid JSON fails immediately during DAG parsing.
- **`DatabricksSubmitRunOperator.__init__`** does **not** call 
`normalise_json_content`. It only runs the validation later in `execute()` 
([line 
567](https://github.com/apache/airflow/blob/main/airflow/providers/databricks/operators/databricks.py)),
 meaning invalid types silently pass DAG parsing and only surface at runtime.

Is this difference intentional? It seems like `DatabricksSubmitRunOperator` 
could also call `normalise_json_content` in `__init__` (or at least validate 
the JSON types), which would allow CI/CD pipelines to catch configuration 
errors before deployment.

We've worked around this by adding our own parse-time validation, but it feels 
like this would be better handled at the operator level for consistency.

GitHub link: https://github.com/apache/airflow/discussions/61888

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]

Reply via email to