[
https://issues.apache.org/jira/browse/HIVE-29640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated HIVE-29640:
----------------------------------
Labels: pull-request-available (was: )
> ADD JAR from a non-default fs fails on Tez due to missing delegation token
> --------------------------------------------------------------------------
>
> Key: HIVE-29640
> URL: https://issues.apache.org/jira/browse/HIVE-29640
> Project: Hive
> Issue Type: Bug
> Reporter: KWON BYUNGCHANG
> Priority: Major
> Labels: pull-request-available
>
> h2. Problem
> In a Kerberized cluster, a query that pulls in jars from an HDFS
> namenode other than `fs.defaultFS` fails when the execution engine is
> Tez:
> ```
> SET hive.execution.engine=tez;
> ADD JAR hdfs://other-nn/libs/my-udf.jar;
> SELECT my_udf(...) FROM t;
> ```
> The jar is distributed to Tez containers as an AM-local resource via
> the distributed cache. The container tries to localize it from
> `hdfs://other-nn/...`, finds no HDFS delegation token for `other-nn`
> in its `Credentials`, and fails resource localization. The query
> aborts before any task runs.
> `fs.defaultFS` jars work fine because Tez/Hadoop's standard code path
> issues a token for the default namenode on its own.
> h2. Root cause
> `TezClientUtils.setupAMLocalResources` does not fetch HDFS delegation
> tokens for AM-local resources — it expects the caller to provide them
> via `AMCredentials`. HS2 (`TezSessionState`) currently passes only the
> LLAP credentials and never enumerates the non-defaultFS namenodes
> referenced by `ADD JAR` / `ADD FILE` resources, so the AM ends up
> without a token for those namenodes.
> h2. Fix
> Before handing local resources to TezClient, walk the common local
> resource map, collect every distinct non-`fs.defaultFS` HDFS namenode
> referenced, fetch delegation tokens for those namenodes via
> `TokenCache.obtainTokensForNamenodes`, and merge them into the
> credentials passed to TezClient (alongside any existing LLAP
> credentials).
> Implementation lives in a new helper
> `TezSessionState#createLocalResourceCredentialsExcludingDefaultFS` and
> filters out:
> - Resources on `fs.defaultFS` (Tez/Hadoop issues that token already;
> duplicate issuance adds latency and NameNode heap pressure).
> h2. Repro
> 1. Kerberized HS2 with `hive.execution.engine=tez`.
> 2. From beeline:
> ```
> ADD JAR hdfs://other-nn/path/to/udf.jar;
> CREATE TEMPORARY FUNCTION my_udf AS '…';
> SELECT my_udf(col) FROM tbl;
> ```
> where `other-nn` is a federated namenode distinct from
> `fs.defaultFS`.
> 3. Expected: query runs.
> Actual: localization fails on the AM/container with a missing
> delegation token error for `other-nn`.
> h2. Compatibility
> - Behaviour is unchanged when all `ADD JAR` resources live on
> `fs.defaultFS` or on the local filesystem.
> - Non-Kerberized clusters are unaffected (token issuance is a no-op).
> - No new configuration. No new dependencies.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)