W.r.t to connection reuse issues, LLAP had a similar issue (not in HMS)
https://issues.apache.org/jira/browse/HIVE-16020. It was making a
connection in every task and UGI had to be persisted in the QueryInfo level
to reduce the impact.

In hive, FileUtils.checkFileAccessWithImpersonation can be fixed to use
create UGI once to reduce the impact (suspecting this will have 50%
impact).

https://github.com/apache/hive/blob/master/common/src/java/org/apache/hadoop/hive/common/FileUtils.java#L418
https://github.com/apache/hive/blob/d06957f254e026e719f30027d161264be43386b0/common/src/java/org/apache/hadoop/hive/common/FileUtils.java#L461

May have to explore whether a local cache with expiry in FileUtils can help
reduce the impact further.

~Rajesh.B


On Thu, Sep 1, 2022 at 1:24 AM Owen O'Malley <owen.omal...@gmail.com> wrote:

> We're using HMS with Storage-Based Authorization and have been having
> trouble with the HMS running out of threads. Looking at the jstack & code,
> it appears to that the problem is that RPC's ConnectionId is using UGI's
> equal/hash, which uses the Subject's Object equals/hash. Proxy user UGI's
> always create a new Subject and thus are always unique.
>
> This leads to the HMS creating too many threads. I've created a jira in
> Hadoop. https://issues.apache.org/jira/browse/HADOOP-18434
>
> Thanks,
>    Owen
>

Reply via email to