[
https://issues.apache.org/jira/browse/SPARK-17516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hyukjin Kwon updated SPARK-17516:
---------------------------------
Priority: Major (was: Critical)
> Current user info is not checked on STS in DML queries
> ------------------------------------------------------
>
> Key: SPARK-17516
> URL: https://issues.apache.org/jira/browse/SPARK-17516
> Project: Spark
> Issue Type: Bug
> Reporter: Tao Li
> Priority: Major
>
> I have captured some issues related to doAs support from STS. I am using a
> non-secure cluster as my test environment. Simply speaking, the end user info
> is not being passed when STS talks to metastore, so the impersonation is not
> happening on metastore.
> STS is using a ClientWarpper instance (which is wrapped in HiveContext) for
> each session. However by design all ClientWarpper instances are sharing the
> same Hive instance, which is responsible for talking to Metastore. A
> singleton IsolatedClientLoader instance is initialized when STS starts up and
> it contains the cachedHive instance. The cachedHive is associated “hive” UGI,
> since no session has been set up so current user is “hive". Then each session
> creates a ClientWarpper instance which is associated with the same cachedHive
> instance.
> When we make queries after session is established, the code path to retrieve
> the Hive instance is different for DML and DDL operation. Looks like DML
> operation related code has less dependency on hive-exec module.
> For the DML operations (e.g. “select *”), STS calls into ClientWarpper code
> and talks to metastore through the singleton Hive instance directly. There is
> no code involved to check the current user. That’s why doAs is not being
> respected, even though current user is already switched to the end user in
> the thread context.
> For DDL operations (e.g. “ALTER table”), STS eventually calls into hive
> driver code (e.g. BaseSemanticAnalyzer). From there Hive.get() is called to
> get the thread local Hive instance and refresh it if necessary. If the
> current user has changed, we refresh the Hive instance by recreating the
> metastore connection with the current user info. So even though all thread
> locals are actually referencing the singleton Hive instance, calling
> Hive.get() is playing an important role here to take any UGI change into
> account. That’s why the DDL operations respects doAs .
> The fix should be calling Hive.get() for the DML operations, like the hive
> driver code called from DDL operation.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]