Vamsi-klu opened a new pull request, #68144:
URL: https://github.com/apache/airflow/pull/68144
Add support for passing validated Beeline JDBC URL parameters through
`HiveCliHook` and `HiveOperator`.
closes: #45049
## Why
Some Hive Beeline deployments require JDBC URL parameters such as
`transportMode`, trust store settings, or other driver-specific options.
Previously, Airflow's Hive CLI hook did not provide a safe,
Dag-author-controlled way to append these parameters to the generated Beeline
JDBC URL.
This change intentionally avoids accepting arbitrary JDBC parameters from
connection extras. Connection extras are managed through the Airflow connection
UI and can be shared or reused across many Dags, so using them as a free-form
JDBC parameter bag would make the blast radius larger and harder to reason
about.
## What Changed
- Added a `jdbc_params` argument to `HiveCliHook`.
- Added a matching `jdbc_params` argument to `HiveOperator`, passed through
to the hook.
- Appended validated `jdbc_params` to the generated Beeline JDBC URL.
- Added a bounded connection extra, `transport_mode`, which maps to JDBC
`transportMode` and only accepts `binary` or `http`.
- Rejected unsafe JDBC parameter names and values:
- names must start with a letter and contain only letters, digits, dots,
underscores, or hyphens
- values cannot be `None`
- values cannot contain `;`
- Documented the new hook/operator argument and the limited connection-extra
behavior.
- Updated the Hive provider changelog.
## Impact
Dag authors can now configure Beeline JDBC URL parameters directly in code
when a deployment needs driver-specific settings.
This matters because it allows affected Hive deployments to connect through
Beeline without forcing unsafe, arbitrary JDBC URL parameter injection through
connection extras. The change keeps connection-level configuration bounded
while still giving Dag authors the flexibility needed for per-Dag connection
behavior.
Existing behavior is preserved for:
- Kerberos principal handling
- proxy user handling
- auth parameter handling
- high-availability URL construction
- login/password command arguments
- existing Beeline URLs without additional JDBC parameters
## Why This Solves the Problem
The issue requires a way to add JDBC parameters to Beeline URLs. This
implementation appends those parameters at the point where `HiveCliHook` builds
the Beeline JDBC URL, so the final command contains the required JDBC suffix.
At the same time, the implementation avoids the previously rejected approach
of allowing arbitrary free-form JDBC URL parameters through connection UI
extras. Free-form parameters are only accepted from the hook/operator
constructor, where they are controlled by the Dag author and validated before
use.
## Testing
- `uv run ruff check
providers/apache/hive/src/airflow/providers/apache/hive/hooks/hive.py
providers/apache/hive/src/airflow/providers/apache/hive/operators/hive.py
providers/apache/hive/tests/unit/apache/hive/hooks/test_hive.py
providers/apache/hive/tests/unit/apache/hive/operators/test_hive.py`
- `uv run ruff format --check
providers/apache/hive/src/airflow/providers/apache/hive/hooks/hive.py
providers/apache/hive/src/airflow/providers/apache/hive/operators/hive.py
providers/apache/hive/tests/unit/apache/hive/hooks/test_hive.py
providers/apache/hive/tests/unit/apache/hive/operators/test_hive.py`
- `uv run --project providers/apache/hive mypy
providers/apache/hive/src/airflow/providers/apache/hive/hooks/hive.py
providers/apache/hive/src/airflow/providers/apache/hive/operators/hive.py`
-
`AIRFLOW_CONN_HIVE_CLI_DEFAULT='hive-cli://localhost:10000/default?use_beeline=True'
uv run --project providers/apache/hive pytest
providers/apache/hive/tests/unit/apache/hive/hooks/test_hive.py::TestHiveCliHookJdbcParams
providers/apache/hive/tests/unit/apache/hive/operators/test_hive.py::TestHiveOperatorJdbcParams::test_hive_operator_passes_jdbc_params_to_hook
providers/apache/hive/tests/unit/apache/hive/hooks/test_hive.py::TestHiveCliHook::test_run_cli
providers/apache/hive/tests/unit/apache/hive/hooks/test_hive.py::TestHiveCli::test_get_proxy_user_value
providers/apache/hive/tests/unit/apache/hive/hooks/test_hive.py::TestHiveCli::test_get_wrong_principal
providers/apache/hive/tests/unit/apache/hive/hooks/test_hive.py::TestHiveCli::test_high_availability
-xvs --without-db-init --no-db-cleanup`
- `prek run --from-ref upstream/main --stage pre-commit` passed before the
final rebase. After the final rebase, the Hive-relevant hooks passed, but the
run later failed while creating a devel-common mypy environment because the
local external-volume/macOS environment created an AppleDouble `._ruff` file
inside the wheel install.
- `prek run --from-ref upstream/main --stage manual` was attempted after the
final rebase; provider mypy could not run because local Breeze requires Docker
and `docker` is not installed in this environment.
---
##### Was generative AI tooling used to co-author this PR?
- [X] Yes — Codex (GPT-5)
Generated-by: Codex (GPT-5) following [the
guidelines](https://github.com/apache/airflow/blob/main/contributing-docs/05_pull_requests.rst#gen-ai-assisted-contributions)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]