BentsiLeviav opened a new pull request, #67080:
URL: https://github.com/apache/airflow/pull/67080

   ### Description
   Adds a new apache-airflow-providers-clickhouse provider that integrates 
Airflow with ClickHouse via the HTTP interface using the `clickhouse-connect` 
library.
   
   ### Scope of this implementation
   
     - `ClickHouseHook` - the core integration, extending `DbApiHook` so all 
standard `SQLExecuteQueryOperator` features work out of the box (templating, 
handler, split_statements, etc.)
     - Connection form UI with dedicated fields for TLS, timeouts, compression, 
session settings, and client kwargs
     - `bulk_insert_rows()` for more performant inserts using 
clickhouse-connect's native insert path
     - `get_uri()` for SQLAlchemy-compatible connection strings 
(`clickhousedb://` / `clickhousedbs://`)
     - Connection type docs, operator how-to guide, and integration logo
     - 95 unit tests
   
   ### Implementation decisions
   
     - `DB-API 2.0` adapter (`ClickHouseConnection`): clickhouse-connect 
doesn't expose a DB-API connection natively - we wrap its Client in a thin 
adapter so `DbApiHook.run()` works unmodified. `commit()`
     and `rollback()` are intentional no-ops since ClickHouse has no 
transactions.
     - Two-level settings merge: both `session_settings` and `client_kwargs` 
can be set at the connection level (via the extra JSON field) and overridden at 
the task level (via hook constructor arguments), with the constructor taking 
precedence on conflicts.
     - Hook-managed kwargs protection: keys that the hook owns (host, port, 
username, password, database, secure, verify, client_name, settings) are 
stripped from any user-supplied client_kwargs so hook-managed values always win.
     - Client name: every query is tagged with `apache-airflow/<version> 
apache-airflow-providers-clickhouse/<version>` in the HTTP User-Agent 
(system.query_log), making queries traceable back to their Airflow source. 
Users can append a custom label via the client_name extra field.
     - No dedicated operators are added - `SQLExecuteQueryOperator` from 
`common.sql` covers all standard SQL use cases.
   
   
   
   ## File structure (generated with Claude)
   
     | File(s) | Purpose |
     |---|---|
     | `provider.yaml` | Provider metadata: name, version, integrations, 
connection types, UI field behaviour, and `conn-fields` schema used to generate 
the connection form |
     | `pyproject.toml` | Package build config and dependencies 
(`clickhouse-connect >=0.7.0`, `common-sql >=1.32.0`) — auto-generated from the 
Breeze template |
     | `src/.../hooks/clickhouse.py` | Core implementation: `ClickHouseHook` 
(extends `DbApiHook`) and `ClickHouseConnection` (thin DB-API 2.0 adapter 
wrapping the `clickhouse-connect` client) |
     | `src/.../get_provider_info.py` | Auto-generated from `provider.yaml` by 
the Breeze release tooling — do not edit manually |
     | `src/airflow/__init__.py`, `src/airflow/providers/__init__.py` | 
Namespace package declarations required for the `airflow.providers` implicit 
namespace |
     | `src/.../clickhouse/__init__.py` | Version file (`__version__ = 
"1.0.0"`) with minimum Airflow version guard — auto-generated |
     | `docs/connections/clickhouse.rst` | Connection configuration reference: 
all fields, their types, defaults, and JSON/URI examples |
     | `docs/operators/clickhouse.rst` | How-to guide: using 
`SQLExecuteQueryOperator` and `ClickHouseHook` directly, including 
`session_settings` and `bulk_insert_rows` examples |
     | `docs/index.rst`, `docs/conf.py`, `docs/changelog.rst`, 
`docs/security.rst` | Standard provider docs scaffold — mostly auto-generated |
     | `docs/integration-logos/ClickHouse.png` | Official ClickHouse logo used 
by the Apache Airflow website |
     | `tests/unit/clickhouse/hooks/test_clickhouse.py` | 95 unit tests 
covering connection building, settings/kwargs merge logic, database override, 
URI generation, bulk insert, UI widgets, and
     autocommit semantics |
     | `tests/system/clickhouse/example_clickhouse.py` | System test / example 
DAG: create table → bulk insert → read rows → drop table |
     | `.github/boring-cyborg.yml` | Adds `provider:clickhouse` label rule for 
automatic PR labelling |
     | `scripts/ci/docker-compose/remove-sources.yml`, `tests-sources.yml` | 
Auto-updated by prek to mount the clickhouse provider sources/tests into the CI 
Docker environment |


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to