KWON BYUNGCHANG created HIVE-29640:
--------------------------------------
Summary: ADD JAR from a non-default fs fails on Tez due to missing
delegation token
Key: HIVE-29640
URL: https://issues.apache.org/jira/browse/HIVE-29640
Project: Hive
Issue Type: Bug
Reporter: KWON BYUNGCHANG
h2. Problem
In a Kerberized cluster, a query that pulls in jars from an HDFS
namenode other than `fs.defaultFS` fails when the execution engine is
Tez:
```
SET hive.execution.engine=tez;
ADD JAR hdfs://other-nn/libs/my-udf.jar;
SELECT my_udf(...) FROM t;
```
The jar is distributed to Tez containers as an AM-local resource via
the distributed cache. The container tries to localize it from
`hdfs://other-nn/...`, finds no HDFS delegation token for `other-nn`
in its `Credentials`, and fails resource localization. The query
aborts before any task runs.
`fs.defaultFS` jars work fine because Tez/Hadoop's standard code path
issues a token for the default namenode on its own.
h2. Root cause
`TezClientUtils.setupAMLocalResources` does not fetch HDFS delegation
tokens for AM-local resources — it expects the caller to provide them
via `AMCredentials`. HS2 (`TezSessionState`) currently passes only the
LLAP credentials and never enumerates the non-defaultFS namenodes
referenced by `ADD JAR` / `ADD FILE` resources, so the AM ends up
without a token for those namenodes.
h2. Fix
Before handing local resources to TezClient, walk the common local
resource map, collect every distinct non-`fs.defaultFS` HDFS namenode
referenced, fetch delegation tokens for those namenodes via
`TokenCache.obtainTokensForNamenodes`, and merge them into the
credentials passed to TezClient (alongside any existing LLAP
credentials).
Implementation lives in a new helper
`TezSessionState#createLocalResourceCredentialsExcludingDefaultFS` and
filters out:
- Resources on `fs.defaultFS` (Tez/Hadoop issues that token already;
duplicate issuance adds latency and NameNode heap pressure).
h2. Repro
1. Kerberized HS2 with `hive.execution.engine=tez`.
2. From beeline:
```
ADD JAR hdfs://other-nn/path/to/udf.jar;
CREATE TEMPORARY FUNCTION my_udf AS '…';
SELECT my_udf(col) FROM tbl;
```
where `other-nn` is a federated namenode distinct from
`fs.defaultFS`.
3. Expected: query runs.
Actual: localization fails on the AM/container with a missing
delegation token error for `other-nn`.
h2. Compatibility
- Behaviour is unchanged when all `ADD JAR` resources live on
`fs.defaultFS` or on the local filesystem.
- Non-Kerberized clusters are unaffected (token issuance is a no-op).
- No new configuration. No new dependencies.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)