pauldouane opened a new issue, #55128: URL: https://github.com/apache/airflow/issues/55128
### Description Add a feature to the DatabricksSqlOperator that allows direct export of query results to a Google Cloud Storage (GCS) bucket in Parquet and Avro formats. ### Use case/motivation The current DatabricksSqlOperator allows executing SQL queries and saving the results to a file using the output_path and output_format parameters. However, it has a few limitations for common data engineering workflows: It only supports csv, json, and jsonl formats. Parquet and Avro are widely used for their performance and schema handling. The output is saved to the Databricks cluster's local filesystem (/tmp/), not directly to an object storage like GCS. This requires an additional step (a separate Airflow task, or in-Databricks logic) to move the file to its final destination, adding unnecessary complexity to the DAG. This feature would streamline simple ETL/ELT pipelines by allowing a single Airflow task to: Execute a SQL query on a Databricks warehouse. Export the result as a Parquet or Avro file. Save the file directly to a specified GCS path. This would eliminate the need for an intermediate COPY command or a separate Databricks job (DatabricksSubmitRunOperator) for simple export scenarios, simplifying DAGs and improving readability. Proposed Change: Introduce new output_format values: Add support for 'parquet' and 'avro' to the output_format parameter. Enhance output_path to support object storage URIs: The output_path parameter should be able to accept object storage URIs (e.g., gs://my-bucket/path/to/data.parquet). Implement the export logic: The operator's internal logic would need to be updated to handle the conversion of the SQL query results to the specified format and stream them directly to the GCS location using the appropriate Databricks Spark APIs (e.g., spark.read.sql(...).write.format("parquet").save(...)). ### Related issues _No response_ ### Are you willing to submit a PR? - [ ] Yes I am willing to submit a PR! ### Code of Conduct - [x] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org