davidlghellin opened a new pull request, #61752: URL: https://github.com/apache/airflow/pull/61752
Add new Sail provider for Spark Connect compatible engine This PR adds a new `apache-airflow-providers-sail` provider package that integrates [Sail](https://lakesail.com/) [github-sail](https://github.com/lakehq/sail) (a Rust-native Spark Connect compatible engine) with Apache Airflow. Sail enables existing PySpark code to run without the JVM, using the Spark Connect protocol (`sc://`). Ref: https://github.com/lakehq/sail/issues/300 ### Components included: - **SailHook**: Connection management supporting remote (`sc://`) and local embedded mode - **SailPySparkOperator**: Execute PySpark code on Sail (remote or local server) - **@task.sail_pyspark**: Task decorator for defining PySpark tasks as Python functions - Unit tests, system test with example DAG, and documentation ### Dependencies: - `pysail>=0.5.0` (Apache 2.0) - `pyspark>=3.5.2` (Apache 2.0) - `grpcio-status>=1.59.0` (Apache 2.0) --- ##### Was generative AI tooling used to co-author this PR? - [X] Yes (please specify the tool below) Generated-by: Claude Code (claude-opus-4-6) following [the guidelines](https://github.com/apache/airflow/blob/main/contributing-docs/05_pull_requests.rst#gen-ai-assisted-contributions) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
