Vamsi-klu opened a new pull request, #68144:
URL: https://github.com/apache/airflow/pull/68144

   Add support for passing validated Beeline JDBC URL parameters through 
`HiveCliHook` and `HiveOperator`.
   
   closes: #45049
   
   ## Why
   
   Some Hive Beeline deployments require JDBC URL parameters such as 
`transportMode`, trust store settings, or other driver-specific options. 
Previously, Airflow's Hive CLI hook did not provide a safe, 
Dag-author-controlled way to append these parameters to the generated Beeline 
JDBC URL.
   
   This change intentionally avoids accepting arbitrary JDBC parameters from 
connection extras. Connection extras are managed through the Airflow connection 
UI and can be shared or reused across many Dags, so using them as a free-form 
JDBC parameter bag would make the blast radius larger and harder to reason 
about.
   
   ## What Changed
   
   - Added a `jdbc_params` argument to `HiveCliHook`.
   - Added a matching `jdbc_params` argument to `HiveOperator`, passed through 
to the hook.
   - Appended validated `jdbc_params` to the generated Beeline JDBC URL.
   - Added a bounded connection extra, `transport_mode`, which maps to JDBC 
`transportMode` and only accepts `binary` or `http`.
   - Rejected unsafe JDBC parameter names and values:
     - names must start with a letter and contain only letters, digits, dots, 
underscores, or hyphens
     - values cannot be `None`
     - values cannot contain `;`
   - Documented the new hook/operator argument and the limited connection-extra 
behavior.
   - Updated the Hive provider changelog.
   
   ## Impact
   
   Dag authors can now configure Beeline JDBC URL parameters directly in code 
when a deployment needs driver-specific settings.
   
   This matters because it allows affected Hive deployments to connect through 
Beeline without forcing unsafe, arbitrary JDBC URL parameter injection through 
connection extras. The change keeps connection-level configuration bounded 
while still giving Dag authors the flexibility needed for per-Dag connection 
behavior.
   
   Existing behavior is preserved for:
   
   - Kerberos principal handling
   - proxy user handling
   - auth parameter handling
   - high-availability URL construction
   - login/password command arguments
   - existing Beeline URLs without additional JDBC parameters
   
   ## Why This Solves the Problem
   
   The issue requires a way to add JDBC parameters to Beeline URLs. This 
implementation appends those parameters at the point where `HiveCliHook` builds 
the Beeline JDBC URL, so the final command contains the required JDBC suffix.
   
   At the same time, the implementation avoids the previously rejected approach 
of allowing arbitrary free-form JDBC URL parameters through connection UI 
extras. Free-form parameters are only accepted from the hook/operator 
constructor, where they are controlled by the Dag author and validated before 
use.
   
   ## Testing
   
   - `uv run ruff check 
providers/apache/hive/src/airflow/providers/apache/hive/hooks/hive.py 
providers/apache/hive/src/airflow/providers/apache/hive/operators/hive.py 
providers/apache/hive/tests/unit/apache/hive/hooks/test_hive.py 
providers/apache/hive/tests/unit/apache/hive/operators/test_hive.py`
   - `uv run ruff format --check 
providers/apache/hive/src/airflow/providers/apache/hive/hooks/hive.py 
providers/apache/hive/src/airflow/providers/apache/hive/operators/hive.py 
providers/apache/hive/tests/unit/apache/hive/hooks/test_hive.py 
providers/apache/hive/tests/unit/apache/hive/operators/test_hive.py`
   - `uv run --project providers/apache/hive mypy 
providers/apache/hive/src/airflow/providers/apache/hive/hooks/hive.py 
providers/apache/hive/src/airflow/providers/apache/hive/operators/hive.py`
   - 
`AIRFLOW_CONN_HIVE_CLI_DEFAULT='hive-cli://localhost:10000/default?use_beeline=True'
 uv run --project providers/apache/hive pytest 
providers/apache/hive/tests/unit/apache/hive/hooks/test_hive.py::TestHiveCliHookJdbcParams
 
providers/apache/hive/tests/unit/apache/hive/operators/test_hive.py::TestHiveOperatorJdbcParams::test_hive_operator_passes_jdbc_params_to_hook
 
providers/apache/hive/tests/unit/apache/hive/hooks/test_hive.py::TestHiveCliHook::test_run_cli
 
providers/apache/hive/tests/unit/apache/hive/hooks/test_hive.py::TestHiveCli::test_get_proxy_user_value
 
providers/apache/hive/tests/unit/apache/hive/hooks/test_hive.py::TestHiveCli::test_get_wrong_principal
 
providers/apache/hive/tests/unit/apache/hive/hooks/test_hive.py::TestHiveCli::test_high_availability
 -xvs --without-db-init --no-db-cleanup`
   - `prek run --from-ref upstream/main --stage pre-commit` passed before the 
final rebase. After the final rebase, the Hive-relevant hooks passed, but the 
run later failed while creating a devel-common mypy environment because the 
local external-volume/macOS environment created an AppleDouble `._ruff` file 
inside the wheel install.
   - `prek run --from-ref upstream/main --stage manual` was attempted after the 
final rebase; provider mypy could not run because local Breeze requires Docker 
and `docker` is not installed in this environment.
   
   ---
   
   ##### Was generative AI tooling used to co-author this PR?
   
   - [X] Yes — Codex (GPT-5)
   
   Generated-by: Codex (GPT-5) following [the 
guidelines](https://github.com/apache/airflow/blob/main/contributing-docs/05_pull_requests.rst#gen-ai-assisted-contributions)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to