Vamsi-klu commented on code in PR #68144:
URL: https://github.com/apache/airflow/pull/68144#discussion_r3369969759


##########
providers/apache/hive/docs/connections/hive_cli.rst:
##########
@@ -80,9 +80,36 @@ High Availability (optional)
 Ssl (optional)
     Specify as ``True`` to enable SSL for your high availability connection.
 
+Transport Mode (optional)
+    Specify the JDBC ``transportMode`` parameter. Supported values are 
``binary`` and ``http``.
+
 Zoo Keeper Namespace (optional)
     Zoo keeper namespace for high availability.
 
+Additional JDBC Parameters
+--------------------------
+
+Dag authors can pass additional Beeline JDBC URL parameters to ``HiveCliHook`` 
or ``HiveOperator``
+with the ``jdbc_params`` argument:
+
+.. code-block:: python
+
+   HiveCliHook(
+       jdbc_params={
+           "transportMode": "http",
+           "sslTrustStore": "/opt/hive/truststore.jks",
+           "trustStorePassword": "secret",
+       }
+   )
+
+These parameters are appended to the Beeline JDBC URL. Parameter names must 
start with a letter
+and contain only letters, digits, dots, underscores, or hyphens. Values must 
not contain semicolons.

Review Comment:
   Good idea — added the regex to the docs alongside the prose: 
`^[A-Za-z]([A-Za-z0-9._-]*[A-Za-z0-9])?$`.
   
   ---
   Drafted-by: Claude Code (Opus 4.8) (no human review before posting)



##########
providers/apache/hive/docs/connections/hive_cli.rst:
##########
@@ -80,9 +80,36 @@ High Availability (optional)
 Ssl (optional)
     Specify as ``True`` to enable SSL for your high availability connection.
 
+Transport Mode (optional)
+    Specify the JDBC ``transportMode`` parameter. Supported values are 
``binary`` and ``http``.
+
 Zoo Keeper Namespace (optional)
     Zoo keeper namespace for high availability.
 
+Additional JDBC Parameters
+--------------------------
+
+Dag authors can pass additional Beeline JDBC URL parameters to ``HiveCliHook`` 
or ``HiveOperator``
+with the ``jdbc_params`` argument:
+
+.. code-block:: python
+
+   HiveCliHook(
+       jdbc_params={
+           "transportMode": "http",
+           "sslTrustStore": "/opt/hive/truststore.jks",
+           "trustStorePassword": "secret",
+       }
+   )
+
+These parameters are appended to the Beeline JDBC URL. Parameter names must 
start with a letter
+and contain only letters, digits, dots, underscores, or hyphens. Values must 
not contain semicolons.
+
+Arbitrary JDBC parameters are not read from connection extras. Only fixed 
connection extras with

Review Comment:
   Good questions. With the `transport_mode` special-casing removed, that 
paragraph no longer applies: arbitrary JDBC params are **not** read from 
connection extras at all — `jdbc_params` (supplied through the hook/operator) 
is the only source. I removed the "fixed connection extras / bounded values" 
paragraph, and the section now states exactly that.
   
   ---
   Drafted-by: Claude Code (Opus 4.8) (no human review before posting)



##########
providers/apache/hive/src/airflow/providers/apache/hive/hooks/hive.py:
##########
@@ -50,6 +50,8 @@
 
 
 HIVE_QUEUE_PRIORITIES = ["VERY_HIGH", "HIGH", "NORMAL", "LOW", "VERY_LOW"]
+HIVE_CLI_TRANSPORT_MODES = {"binary", "http"}

Review Comment:
   Agreed — there's no good reason to special-case it. I removed the 
`HIVE_CLI_TRANSPORT_MODES` constant, the form widget, and the 
`_get_connection_jdbc_url_parameters` method. `transportMode` now goes through 
`jdbc_params` like every other param, so there's a single sanctioned, validated 
injection point.
   
   ---
   Drafted-by: Claude Code (Opus 4.8) (no human review before posting)



##########
providers/apache/hive/src/airflow/providers/apache/hive/hooks/hive.py:
##########
@@ -50,6 +50,8 @@
 
 
 HIVE_QUEUE_PRIORITIES = ["VERY_HIGH", "HIGH", "NORMAL", "LOW", "VERY_LOW"]
+HIVE_CLI_TRANSPORT_MODES = {"binary", "http"}
+JDBC_PARAMETER_NAME_PATTERN = re.compile(r"^[A-Za-z][A-Za-z0-9._-]*$")

Review Comment:
   Good catch — it could. I tightened the pattern to 
`^[A-Za-z]([A-Za-z0-9._-]*[A-Za-z0-9])?$`, which still allows a single letter 
but otherwise requires the name to end in a letter or digit, so 
`transportMode-`, `foo.`, and `bar_` are now rejected. (Inside the character 
class `.` and `-` are literal, so no escaping is needed.) Added tests for those 
cases.
   
   ---
   Drafted-by: Claude Code (Opus 4.8) (no human review before posting)



##########
providers/apache/hive/src/airflow/providers/apache/hive/hooks/hive.py:
##########
@@ -141,6 +148,9 @@ def get_connection_form_widgets(cls) -> dict[str, Any]:
             ),
             "high_availability": BooleanField(lazy_gettext("High Availability 
mode"), default=False),
             "ssl": BooleanField(lazy_gettext("Ssl"), default=True),
+            "transport_mode": StringField(
+                lazy_gettext("Transport Mode"), widget=BS3TextFieldWidget(), 
default=""

Review Comment:
   Removed — this widget was part of the `transport_mode` connection-extra 
special-casing, which is gone. `transportMode` is now just another 
`jdbc_params` entry.
   
   ---
   Drafted-by: Claude Code (Opus 4.8) (no human review before posting)



##########
providers/apache/hive/src/airflow/providers/apache/hive/hooks/hive.py:
##########
@@ -212,6 +223,55 @@ def _prepare_cli_cmd(self) -> list[Any]:
 
         return [hive_bin, *cmd_extra, *hive_params_list]
 
+    def _get_jdbc_url_parameters(self) -> dict[str, str]:
+        jdbc_params = self._get_connection_jdbc_url_parameters()
+        
jdbc_params.update(self._validate_jdbc_url_parameters(self.jdbc_params))
+        return jdbc_params
+
+    def _get_connection_jdbc_url_parameters(self) -> dict[str, str]:
+        extra_dejson = getattr(self.conn, "extra_dejson", {})
+        transport_mode = extra_dejson.get("transport_mode")
+        if not transport_mode:
+            return {}
+        transport_mode = str(transport_mode).lower()
+        if transport_mode not in HIVE_CLI_TRANSPORT_MODES:
+            allowed_modes = ", ".join(sorted(HIVE_CLI_TRANSPORT_MODES))
+            raise ValueError(f"The transport_mode connection extra should be 
one of: {allowed_modes}")
+        return {"transportMode": transport_mode}

Review Comment:
   Right — that was a side effect of the connection-extra special-casing, which 
I've now removed. There's no longer a connection-level transport-mode path; all 
JDBC params (including `transportMode`) come through `jdbc_params`.
   
   ---
   Drafted-by: Claude Code (Opus 4.8) (no human review before posting)



##########
providers/apache/hive/src/airflow/providers/apache/hive/hooks/hive.py:
##########
@@ -212,6 +223,55 @@ def _prepare_cli_cmd(self) -> list[Any]:
 
         return [hive_bin, *cmd_extra, *hive_params_list]
 
+    def _get_jdbc_url_parameters(self) -> dict[str, str]:
+        jdbc_params = self._get_connection_jdbc_url_parameters()
+        
jdbc_params.update(self._validate_jdbc_url_parameters(self.jdbc_params))
+        return jdbc_params

Review Comment:
   Done — inlined. That helper is gone; validation and URL-building now happen 
together in a single `_append_jdbc_params` pass.
   
   ---
   Drafted-by: Claude Code (Opus 4.8) (no human review before posting)



##########
providers/apache/hive/src/airflow/providers/apache/hive/hooks/hive.py:
##########
@@ -212,6 +223,55 @@ def _prepare_cli_cmd(self) -> list[Any]:
 
         return [hive_bin, *cmd_extra, *hive_params_list]
 
+    def _get_jdbc_url_parameters(self) -> dict[str, str]:
+        jdbc_params = self._get_connection_jdbc_url_parameters()
+        
jdbc_params.update(self._validate_jdbc_url_parameters(self.jdbc_params))
+        return jdbc_params
+
+    def _get_connection_jdbc_url_parameters(self) -> dict[str, str]:
+        extra_dejson = getattr(self.conn, "extra_dejson", {})
+        transport_mode = extra_dejson.get("transport_mode")
+        if not transport_mode:
+            return {}
+        transport_mode = str(transport_mode).lower()
+        if transport_mode not in HIVE_CLI_TRANSPORT_MODES:
+            allowed_modes = ", ".join(sorted(HIVE_CLI_TRANSPORT_MODES))
+            raise ValueError(f"The transport_mode connection extra should be 
one of: {allowed_modes}")

Review Comment:
   Agreed — removed. `transportMode` is treated like any other param via 
`jdbc_params` now.
   
   ---
   Drafted-by: Claude Code (Opus 4.8) (no human review before posting)



##########
providers/apache/hive/src/airflow/providers/apache/hive/hooks/hive.py:
##########
@@ -212,6 +223,55 @@ def _prepare_cli_cmd(self) -> list[Any]:
 
         return [hive_bin, *cmd_extra, *hive_params_list]
 
+    def _get_jdbc_url_parameters(self) -> dict[str, str]:
+        jdbc_params = self._get_connection_jdbc_url_parameters()
+        
jdbc_params.update(self._validate_jdbc_url_parameters(self.jdbc_params))
+        return jdbc_params
+
+    def _get_connection_jdbc_url_parameters(self) -> dict[str, str]:
+        extra_dejson = getattr(self.conn, "extra_dejson", {})
+        transport_mode = extra_dejson.get("transport_mode")
+        if not transport_mode:
+            return {}
+        transport_mode = str(transport_mode).lower()
+        if transport_mode not in HIVE_CLI_TRANSPORT_MODES:
+            allowed_modes = ", ".join(sorted(HIVE_CLI_TRANSPORT_MODES))
+            raise ValueError(f"The transport_mode connection extra should be 
one of: {allowed_modes}")
+        return {"transportMode": transport_mode}
+
+    @classmethod
+    def _validate_jdbc_url_parameters(cls, jdbc_params: Mapping[str, Any]) -> 
dict[str, str]:
+        return {
+            cls._validate_jdbc_url_parameter_name(name): 
cls._validate_jdbc_url_parameter_value(name, value)
+            for name, value in jdbc_params.items()
+        }
+
+    @staticmethod
+    def _validate_jdbc_url_parameter_name(name: str) -> str:
+        if not isinstance(name, str) or not 
JDBC_PARAMETER_NAME_PATTERN.fullmatch(name):
+            raise ValueError(
+                "JDBC parameter names must be non-empty strings that start 
with a letter and contain "
+                "only letters, digits, dots, underscores, or hyphens"
+            )
+        return name

Review Comment:
   Consolidated — there's now a single `_append_jdbc_params` that validates 
both name and value inline in one pass, no separate per-check methods.
   
   ---
   Drafted-by: Claude Code (Opus 4.8) (no human review before posting)



##########
providers/apache/hive/src/airflow/providers/apache/hive/hooks/hive.py:
##########
@@ -212,6 +223,55 @@ def _prepare_cli_cmd(self) -> list[Any]:
 
         return [hive_bin, *cmd_extra, *hive_params_list]
 
+    def _get_jdbc_url_parameters(self) -> dict[str, str]:
+        jdbc_params = self._get_connection_jdbc_url_parameters()
+        
jdbc_params.update(self._validate_jdbc_url_parameters(self.jdbc_params))
+        return jdbc_params
+
+    def _get_connection_jdbc_url_parameters(self) -> dict[str, str]:
+        extra_dejson = getattr(self.conn, "extra_dejson", {})
+        transport_mode = extra_dejson.get("transport_mode")
+        if not transport_mode:
+            return {}
+        transport_mode = str(transport_mode).lower()
+        if transport_mode not in HIVE_CLI_TRANSPORT_MODES:
+            allowed_modes = ", ".join(sorted(HIVE_CLI_TRANSPORT_MODES))
+            raise ValueError(f"The transport_mode connection extra should be 
one of: {allowed_modes}")
+        return {"transportMode": transport_mode}
+
+    @classmethod
+    def _validate_jdbc_url_parameters(cls, jdbc_params: Mapping[str, Any]) -> 
dict[str, str]:
+        return {
+            cls._validate_jdbc_url_parameter_name(name): 
cls._validate_jdbc_url_parameter_value(name, value)
+            for name, value in jdbc_params.items()
+        }
+
+    @staticmethod
+    def _validate_jdbc_url_parameter_name(name: str) -> str:
+        if not isinstance(name, str) or not 
JDBC_PARAMETER_NAME_PATTERN.fullmatch(name):
+            raise ValueError(
+                "JDBC parameter names must be non-empty strings that start 
with a letter and contain "
+                "only letters, digits, dots, underscores, or hyphens"
+            )
+        return name
+
+    @staticmethod
+    def _validate_jdbc_url_parameter_value(name: str, value: Any) -> str:
+        if value is None:
+            raise ValueError(f"JDBC parameter {name!r} value should not be 
None")
+        value = str(value)
+        if ";" in value:
+            raise ValueError(f"JDBC parameter {name!r} value should not 
contain the ';' character")
+        return value

Review Comment:
   For the value the only rule is "no `;`", which I kept as a `";" in 
str(value)` membership check inside the same method — I think that reads a bit 
more clearly than a regex here, but happy to switch to one if you'd prefer.
   
   ---
   Drafted-by: Claude Code (Opus 4.8) (no human review before posting)



##########
providers/apache/hive/src/airflow/providers/apache/hive/hooks/hive.py:
##########
@@ -212,6 +223,55 @@ def _prepare_cli_cmd(self) -> list[Any]:
 
         return [hive_bin, *cmd_extra, *hive_params_list]
 
+    def _get_jdbc_url_parameters(self) -> dict[str, str]:
+        jdbc_params = self._get_connection_jdbc_url_parameters()
+        
jdbc_params.update(self._validate_jdbc_url_parameters(self.jdbc_params))
+        return jdbc_params
+
+    def _get_connection_jdbc_url_parameters(self) -> dict[str, str]:
+        extra_dejson = getattr(self.conn, "extra_dejson", {})
+        transport_mode = extra_dejson.get("transport_mode")
+        if not transport_mode:
+            return {}
+        transport_mode = str(transport_mode).lower()
+        if transport_mode not in HIVE_CLI_TRANSPORT_MODES:
+            allowed_modes = ", ".join(sorted(HIVE_CLI_TRANSPORT_MODES))
+            raise ValueError(f"The transport_mode connection extra should be 
one of: {allowed_modes}")
+        return {"transportMode": transport_mode}
+
+    @classmethod
+    def _validate_jdbc_url_parameters(cls, jdbc_params: Mapping[str, Any]) -> 
dict[str, str]:
+        return {
+            cls._validate_jdbc_url_parameter_name(name): 
cls._validate_jdbc_url_parameter_value(name, value)
+            for name, value in jdbc_params.items()
+        }
+
+    @staticmethod
+    def _validate_jdbc_url_parameter_name(name: str) -> str:
+        if not isinstance(name, str) or not 
JDBC_PARAMETER_NAME_PATTERN.fullmatch(name):
+            raise ValueError(
+                "JDBC parameter names must be non-empty strings that start 
with a letter and contain "
+                "only letters, digits, dots, underscores, or hyphens"
+            )
+        return name
+
+    @staticmethod
+    def _validate_jdbc_url_parameter_value(name: str, value: Any) -> str:
+        if value is None:
+            raise ValueError(f"JDBC parameter {name!r} value should not be 
None")
+        value = str(value)
+        if ";" in value:
+            raise ValueError(f"JDBC parameter {name!r} value should not 
contain the ';' character")
+        return value
+
+    @staticmethod
+    def _append_jdbc_url_parameters(jdbc_url: str, jdbc_params: Mapping[str, 
str]) -> str:
+        if not jdbc_params:
+            return jdbc_url
+        jdbc_url_params = ";".join(f"{name}={value}" for name, value in 
jdbc_params.items())
+        separator = "" if jdbc_url.endswith(";") else ";"
+        return f"{jdbc_url}{separator}{jdbc_url_params}"

Review Comment:
   Good question. A DAG author can't edit the JDBC URL directly — 
`_prepare_cli_cmd` assembles it programmatically from the connection's 
host/port/schema/auth (plus principal/proxy_user and the HA settings), and 
`_validate_beeline_parameters` rejects `;` in the schema, so there's no raw-URL 
field to append params to. `jdbc_params` is the sanctioned, validated place for 
that.
   
   The use case driving this (issue #45049) is Cloudera CDP: the same base 
`hive_cli` connection is used against clusters that need params like 
`transportMode=http`, `httpPath`, `ssl`, `sslTrustStore`, `trustStorePassword`, 
`AuthMech` — values that vary by target cluster rather than belonging on one 
shared connection, so a per-task `jdbc_params` is more practical than a 
separate connection per variant.
   
   ---
   Drafted-by: Claude Code (Opus 4.8) (no human review before posting)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to