GitHub user clement-micol created a discussion: Should `DatabricksSubmitRunOperator` fail at parsing time if JSON is not valid instead of execution time?
When adding a feature to one of my Airflow DAGs, I mistakenly passed a bad cluster configuration to the `new_cluster` keyword argument of `DatabricksSubmitRunOperator`. Our CI/CD tests that DAGs compile and parse, but doesn't execute them. Because `DatabricksSubmitRunOperator` only calls `normalise_json_content` inside `execute()` (not in `__init__`), we only caught the error once the change was deployed and broke our DAGs at runtime. Looking at the code (as of provider version 7.6.0), there seems to be an inconsistency between the two operators: - **`DatabricksRunNowOperator.__init__`** calls `normalise_json_content(self.json)` at construction time ([line 822-823](https://github.com/apache/airflow/blob/main/airflow/providers/databricks/operators/databricks.py)), so invalid JSON fails immediately during DAG parsing. - **`DatabricksSubmitRunOperator.__init__`** does **not** call `normalise_json_content`. It only runs the validation later in `execute()` ([line 567](https://github.com/apache/airflow/blob/main/airflow/providers/databricks/operators/databricks.py)), meaning invalid types silently pass DAG parsing and only surface at runtime. Is this difference intentional? It seems like `DatabricksSubmitRunOperator` could also call `normalise_json_content` in `__init__` (or at least validate the JSON types), which would allow CI/CD pipelines to catch configuration errors before deployment. We've worked around this by adding our own parse-time validation, but it feels like this would be better handled at the operator level for consistency. GitHub link: https://github.com/apache/airflow/discussions/61888 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected]
