Vamsi-klu commented on code in PR #68144:
URL: https://github.com/apache/airflow/pull/68144#discussion_r3369969759
##########
providers/apache/hive/docs/connections/hive_cli.rst:
##########
@@ -80,9 +80,36 @@ High Availability (optional)
Ssl (optional)
Specify as ``True`` to enable SSL for your high availability connection.
+Transport Mode (optional)
+ Specify the JDBC ``transportMode`` parameter. Supported values are
``binary`` and ``http``.
+
Zoo Keeper Namespace (optional)
Zoo keeper namespace for high availability.
+Additional JDBC Parameters
+--------------------------
+
+Dag authors can pass additional Beeline JDBC URL parameters to ``HiveCliHook``
or ``HiveOperator``
+with the ``jdbc_params`` argument:
+
+.. code-block:: python
+
+ HiveCliHook(
+ jdbc_params={
+ "transportMode": "http",
+ "sslTrustStore": "/opt/hive/truststore.jks",
+ "trustStorePassword": "secret",
+ }
+ )
+
+These parameters are appended to the Beeline JDBC URL. Parameter names must
start with a letter
+and contain only letters, digits, dots, underscores, or hyphens. Values must
not contain semicolons.
Review Comment:
Good idea — added the regex to the docs alongside the prose:
`^[A-Za-z]([A-Za-z0-9._-]*[A-Za-z0-9])?$`.
---
Drafted-by: Claude Code (Opus 4.8) (no human review before posting)
##########
providers/apache/hive/docs/connections/hive_cli.rst:
##########
@@ -80,9 +80,36 @@ High Availability (optional)
Ssl (optional)
Specify as ``True`` to enable SSL for your high availability connection.
+Transport Mode (optional)
+ Specify the JDBC ``transportMode`` parameter. Supported values are
``binary`` and ``http``.
+
Zoo Keeper Namespace (optional)
Zoo keeper namespace for high availability.
+Additional JDBC Parameters
+--------------------------
+
+Dag authors can pass additional Beeline JDBC URL parameters to ``HiveCliHook``
or ``HiveOperator``
+with the ``jdbc_params`` argument:
+
+.. code-block:: python
+
+ HiveCliHook(
+ jdbc_params={
+ "transportMode": "http",
+ "sslTrustStore": "/opt/hive/truststore.jks",
+ "trustStorePassword": "secret",
+ }
+ )
+
+These parameters are appended to the Beeline JDBC URL. Parameter names must
start with a letter
+and contain only letters, digits, dots, underscores, or hyphens. Values must
not contain semicolons.
+
+Arbitrary JDBC parameters are not read from connection extras. Only fixed
connection extras with
Review Comment:
Good questions. With the `transport_mode` special-casing removed, that
paragraph no longer applies: arbitrary JDBC params are **not** read from
connection extras at all — `jdbc_params` (supplied through the hook/operator)
is the only source. I removed the "fixed connection extras / bounded values"
paragraph, and the section now states exactly that.
---
Drafted-by: Claude Code (Opus 4.8) (no human review before posting)
##########
providers/apache/hive/src/airflow/providers/apache/hive/hooks/hive.py:
##########
@@ -50,6 +50,8 @@
HIVE_QUEUE_PRIORITIES = ["VERY_HIGH", "HIGH", "NORMAL", "LOW", "VERY_LOW"]
+HIVE_CLI_TRANSPORT_MODES = {"binary", "http"}
Review Comment:
Agreed — there's no good reason to special-case it. I removed the
`HIVE_CLI_TRANSPORT_MODES` constant, the form widget, and the
`_get_connection_jdbc_url_parameters` method. `transportMode` now goes through
`jdbc_params` like every other param, so there's a single sanctioned, validated
injection point.
---
Drafted-by: Claude Code (Opus 4.8) (no human review before posting)
##########
providers/apache/hive/src/airflow/providers/apache/hive/hooks/hive.py:
##########
@@ -50,6 +50,8 @@
HIVE_QUEUE_PRIORITIES = ["VERY_HIGH", "HIGH", "NORMAL", "LOW", "VERY_LOW"]
+HIVE_CLI_TRANSPORT_MODES = {"binary", "http"}
+JDBC_PARAMETER_NAME_PATTERN = re.compile(r"^[A-Za-z][A-Za-z0-9._-]*$")
Review Comment:
Good catch — it could. I tightened the pattern to
`^[A-Za-z]([A-Za-z0-9._-]*[A-Za-z0-9])?$`, which still allows a single letter
but otherwise requires the name to end in a letter or digit, so
`transportMode-`, `foo.`, and `bar_` are now rejected. (Inside the character
class `.` and `-` are literal, so no escaping is needed.) Added tests for those
cases.
---
Drafted-by: Claude Code (Opus 4.8) (no human review before posting)
##########
providers/apache/hive/src/airflow/providers/apache/hive/hooks/hive.py:
##########
@@ -141,6 +148,9 @@ def get_connection_form_widgets(cls) -> dict[str, Any]:
),
"high_availability": BooleanField(lazy_gettext("High Availability
mode"), default=False),
"ssl": BooleanField(lazy_gettext("Ssl"), default=True),
+ "transport_mode": StringField(
+ lazy_gettext("Transport Mode"), widget=BS3TextFieldWidget(),
default=""
Review Comment:
Removed — this widget was part of the `transport_mode` connection-extra
special-casing, which is gone. `transportMode` is now just another
`jdbc_params` entry.
---
Drafted-by: Claude Code (Opus 4.8) (no human review before posting)
##########
providers/apache/hive/src/airflow/providers/apache/hive/hooks/hive.py:
##########
@@ -212,6 +223,55 @@ def _prepare_cli_cmd(self) -> list[Any]:
return [hive_bin, *cmd_extra, *hive_params_list]
+ def _get_jdbc_url_parameters(self) -> dict[str, str]:
+ jdbc_params = self._get_connection_jdbc_url_parameters()
+
jdbc_params.update(self._validate_jdbc_url_parameters(self.jdbc_params))
+ return jdbc_params
+
+ def _get_connection_jdbc_url_parameters(self) -> dict[str, str]:
+ extra_dejson = getattr(self.conn, "extra_dejson", {})
+ transport_mode = extra_dejson.get("transport_mode")
+ if not transport_mode:
+ return {}
+ transport_mode = str(transport_mode).lower()
+ if transport_mode not in HIVE_CLI_TRANSPORT_MODES:
+ allowed_modes = ", ".join(sorted(HIVE_CLI_TRANSPORT_MODES))
+ raise ValueError(f"The transport_mode connection extra should be
one of: {allowed_modes}")
+ return {"transportMode": transport_mode}
Review Comment:
Right — that was a side effect of the connection-extra special-casing, which
I've now removed. There's no longer a connection-level transport-mode path; all
JDBC params (including `transportMode`) come through `jdbc_params`.
---
Drafted-by: Claude Code (Opus 4.8) (no human review before posting)
##########
providers/apache/hive/src/airflow/providers/apache/hive/hooks/hive.py:
##########
@@ -212,6 +223,55 @@ def _prepare_cli_cmd(self) -> list[Any]:
return [hive_bin, *cmd_extra, *hive_params_list]
+ def _get_jdbc_url_parameters(self) -> dict[str, str]:
+ jdbc_params = self._get_connection_jdbc_url_parameters()
+
jdbc_params.update(self._validate_jdbc_url_parameters(self.jdbc_params))
+ return jdbc_params
Review Comment:
Done — inlined. That helper is gone; validation and URL-building now happen
together in a single `_append_jdbc_params` pass.
---
Drafted-by: Claude Code (Opus 4.8) (no human review before posting)
##########
providers/apache/hive/src/airflow/providers/apache/hive/hooks/hive.py:
##########
@@ -212,6 +223,55 @@ def _prepare_cli_cmd(self) -> list[Any]:
return [hive_bin, *cmd_extra, *hive_params_list]
+ def _get_jdbc_url_parameters(self) -> dict[str, str]:
+ jdbc_params = self._get_connection_jdbc_url_parameters()
+
jdbc_params.update(self._validate_jdbc_url_parameters(self.jdbc_params))
+ return jdbc_params
+
+ def _get_connection_jdbc_url_parameters(self) -> dict[str, str]:
+ extra_dejson = getattr(self.conn, "extra_dejson", {})
+ transport_mode = extra_dejson.get("transport_mode")
+ if not transport_mode:
+ return {}
+ transport_mode = str(transport_mode).lower()
+ if transport_mode not in HIVE_CLI_TRANSPORT_MODES:
+ allowed_modes = ", ".join(sorted(HIVE_CLI_TRANSPORT_MODES))
+ raise ValueError(f"The transport_mode connection extra should be
one of: {allowed_modes}")
Review Comment:
Agreed — removed. `transportMode` is treated like any other param via
`jdbc_params` now.
---
Drafted-by: Claude Code (Opus 4.8) (no human review before posting)
##########
providers/apache/hive/src/airflow/providers/apache/hive/hooks/hive.py:
##########
@@ -212,6 +223,55 @@ def _prepare_cli_cmd(self) -> list[Any]:
return [hive_bin, *cmd_extra, *hive_params_list]
+ def _get_jdbc_url_parameters(self) -> dict[str, str]:
+ jdbc_params = self._get_connection_jdbc_url_parameters()
+
jdbc_params.update(self._validate_jdbc_url_parameters(self.jdbc_params))
+ return jdbc_params
+
+ def _get_connection_jdbc_url_parameters(self) -> dict[str, str]:
+ extra_dejson = getattr(self.conn, "extra_dejson", {})
+ transport_mode = extra_dejson.get("transport_mode")
+ if not transport_mode:
+ return {}
+ transport_mode = str(transport_mode).lower()
+ if transport_mode not in HIVE_CLI_TRANSPORT_MODES:
+ allowed_modes = ", ".join(sorted(HIVE_CLI_TRANSPORT_MODES))
+ raise ValueError(f"The transport_mode connection extra should be
one of: {allowed_modes}")
+ return {"transportMode": transport_mode}
+
+ @classmethod
+ def _validate_jdbc_url_parameters(cls, jdbc_params: Mapping[str, Any]) ->
dict[str, str]:
+ return {
+ cls._validate_jdbc_url_parameter_name(name):
cls._validate_jdbc_url_parameter_value(name, value)
+ for name, value in jdbc_params.items()
+ }
+
+ @staticmethod
+ def _validate_jdbc_url_parameter_name(name: str) -> str:
+ if not isinstance(name, str) or not
JDBC_PARAMETER_NAME_PATTERN.fullmatch(name):
+ raise ValueError(
+ "JDBC parameter names must be non-empty strings that start
with a letter and contain "
+ "only letters, digits, dots, underscores, or hyphens"
+ )
+ return name
Review Comment:
Consolidated — there's now a single `_append_jdbc_params` that validates
both name and value inline in one pass, no separate per-check methods.
---
Drafted-by: Claude Code (Opus 4.8) (no human review before posting)
##########
providers/apache/hive/src/airflow/providers/apache/hive/hooks/hive.py:
##########
@@ -212,6 +223,55 @@ def _prepare_cli_cmd(self) -> list[Any]:
return [hive_bin, *cmd_extra, *hive_params_list]
+ def _get_jdbc_url_parameters(self) -> dict[str, str]:
+ jdbc_params = self._get_connection_jdbc_url_parameters()
+
jdbc_params.update(self._validate_jdbc_url_parameters(self.jdbc_params))
+ return jdbc_params
+
+ def _get_connection_jdbc_url_parameters(self) -> dict[str, str]:
+ extra_dejson = getattr(self.conn, "extra_dejson", {})
+ transport_mode = extra_dejson.get("transport_mode")
+ if not transport_mode:
+ return {}
+ transport_mode = str(transport_mode).lower()
+ if transport_mode not in HIVE_CLI_TRANSPORT_MODES:
+ allowed_modes = ", ".join(sorted(HIVE_CLI_TRANSPORT_MODES))
+ raise ValueError(f"The transport_mode connection extra should be
one of: {allowed_modes}")
+ return {"transportMode": transport_mode}
+
+ @classmethod
+ def _validate_jdbc_url_parameters(cls, jdbc_params: Mapping[str, Any]) ->
dict[str, str]:
+ return {
+ cls._validate_jdbc_url_parameter_name(name):
cls._validate_jdbc_url_parameter_value(name, value)
+ for name, value in jdbc_params.items()
+ }
+
+ @staticmethod
+ def _validate_jdbc_url_parameter_name(name: str) -> str:
+ if not isinstance(name, str) or not
JDBC_PARAMETER_NAME_PATTERN.fullmatch(name):
+ raise ValueError(
+ "JDBC parameter names must be non-empty strings that start
with a letter and contain "
+ "only letters, digits, dots, underscores, or hyphens"
+ )
+ return name
+
+ @staticmethod
+ def _validate_jdbc_url_parameter_value(name: str, value: Any) -> str:
+ if value is None:
+ raise ValueError(f"JDBC parameter {name!r} value should not be
None")
+ value = str(value)
+ if ";" in value:
+ raise ValueError(f"JDBC parameter {name!r} value should not
contain the ';' character")
+ return value
Review Comment:
For the value the only rule is "no `;`", which I kept as a `";" in
str(value)` membership check inside the same method — I think that reads a bit
more clearly than a regex here, but happy to switch to one if you'd prefer.
---
Drafted-by: Claude Code (Opus 4.8) (no human review before posting)
##########
providers/apache/hive/src/airflow/providers/apache/hive/hooks/hive.py:
##########
@@ -212,6 +223,55 @@ def _prepare_cli_cmd(self) -> list[Any]:
return [hive_bin, *cmd_extra, *hive_params_list]
+ def _get_jdbc_url_parameters(self) -> dict[str, str]:
+ jdbc_params = self._get_connection_jdbc_url_parameters()
+
jdbc_params.update(self._validate_jdbc_url_parameters(self.jdbc_params))
+ return jdbc_params
+
+ def _get_connection_jdbc_url_parameters(self) -> dict[str, str]:
+ extra_dejson = getattr(self.conn, "extra_dejson", {})
+ transport_mode = extra_dejson.get("transport_mode")
+ if not transport_mode:
+ return {}
+ transport_mode = str(transport_mode).lower()
+ if transport_mode not in HIVE_CLI_TRANSPORT_MODES:
+ allowed_modes = ", ".join(sorted(HIVE_CLI_TRANSPORT_MODES))
+ raise ValueError(f"The transport_mode connection extra should be
one of: {allowed_modes}")
+ return {"transportMode": transport_mode}
+
+ @classmethod
+ def _validate_jdbc_url_parameters(cls, jdbc_params: Mapping[str, Any]) ->
dict[str, str]:
+ return {
+ cls._validate_jdbc_url_parameter_name(name):
cls._validate_jdbc_url_parameter_value(name, value)
+ for name, value in jdbc_params.items()
+ }
+
+ @staticmethod
+ def _validate_jdbc_url_parameter_name(name: str) -> str:
+ if not isinstance(name, str) or not
JDBC_PARAMETER_NAME_PATTERN.fullmatch(name):
+ raise ValueError(
+ "JDBC parameter names must be non-empty strings that start
with a letter and contain "
+ "only letters, digits, dots, underscores, or hyphens"
+ )
+ return name
+
+ @staticmethod
+ def _validate_jdbc_url_parameter_value(name: str, value: Any) -> str:
+ if value is None:
+ raise ValueError(f"JDBC parameter {name!r} value should not be
None")
+ value = str(value)
+ if ";" in value:
+ raise ValueError(f"JDBC parameter {name!r} value should not
contain the ';' character")
+ return value
+
+ @staticmethod
+ def _append_jdbc_url_parameters(jdbc_url: str, jdbc_params: Mapping[str,
str]) -> str:
+ if not jdbc_params:
+ return jdbc_url
+ jdbc_url_params = ";".join(f"{name}={value}" for name, value in
jdbc_params.items())
+ separator = "" if jdbc_url.endswith(";") else ";"
+ return f"{jdbc_url}{separator}{jdbc_url_params}"
Review Comment:
Good question. A DAG author can't edit the JDBC URL directly —
`_prepare_cli_cmd` assembles it programmatically from the connection's
host/port/schema/auth (plus principal/proxy_user and the HA settings), and
`_validate_beeline_parameters` rejects `;` in the schema, so there's no raw-URL
field to append params to. `jdbc_params` is the sanctioned, validated place for
that.
The use case driving this (issue #45049) is Cloudera CDP: the same base
`hive_cli` connection is used against clusters that need params like
`transportMode=http`, `httpPath`, `ssl`, `sslTrustStore`, `trustStorePassword`,
`AuthMech` — values that vary by target cluster rather than belonging on one
shared connection, so a per-task `jdbc_params` is more practical than a
separate connection per variant.
---
Drafted-by: Claude Code (Opus 4.8) (no human review before posting)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]