Re: Why SQL Standards Based Authorization is implemented on HiveServer2 side only?

Alan Gates Tue, 05 May 2015 09:06:00 -0700

The issue is that security checks are done in the client. In order tofully do them in the metastore we would have been forced to movesignificant amounts of functionality out of the client and into themetastore. Query parsing and planning would have had to be moved to themetastore, basically making it HS2. A few more comments inline:

Sergey Tryuber <mailto:stryu...@gmail.com>
May 4, 2015 at 12:56
Hi Guys,


My understanding is that there are two safe ways of usage of SQL Standards
Based Authorization (SSBA):

1. Hide Hive Metastore from the world by embedding it into HiveServer2.
*MetaStoreAuthzAPIAuthorizerEmbedOnly* configuration for Metastore is
only a half-protection since everyone can change tables-specific metadata.
2. Have "two Metastores", but Public one should be additionally
protected by Storage Based Authorization

Option #2 is much more demanded, since there are too many frameworks in
Hadoop ecosystem which use Hive Metastore. But necessity to keep both SQL
and HDFS ACLs in sync is an administration nightmare (especially taking
into account that "doAs" option is false in SSBA mode).

*Why isn't it possible to add SSBA-like authorizer to Hive Metastore as
well?* The authorizer could check if a user has permissions to update
table-specific metadata according to his role and username. I could even
imagine following layout:

1. All the files in Hive tables can be accessed only by few system users
(hive, spark-sql, impala, etc)

How would you accomplish this? Hive's files are stored in HDFS and thusmust work with HDFS file permissions. You could construct a group thatcontained those users and make the files accessible to that group, buteach cluster admin would have to do that.

2. There is only a single place of granting permissions - through SQL
standards and all SQL-like frameworks around the metastore should use it

We can't break backwards compatibility, so we could make this an optionbut we couldn't enforce it.

3. Additional HDFS permissions configuration would be needed only for
rare cases of data access from non-impersonated execution pipelines (Spark
Core, etc)
4. No necessity to have embedded into HiveServer2 metastore, no strange
configuration options, easier for understanding and documentation

May be I've missed something in my understanding... So, please, pointme to

my mistake in this case.

Re: Why SQL Standards Based Authorization is implemented on HiveServer2 side only?

Reply via email to