Thanks, Alan Of course, I do no suggest to move out SSBA completely to Metastore. As you pointed out, it depends on query processing engine in some (but not all) cases. I was thinking about movement of as much functionality as can be moved and, finally, have two different SSBA covering both Metastore and HS2.
Re HDFS files - yes, the group has to be defined by HDFS admins and it is very simple. Again, by having SSBA in both Metastore and HS2 we avoiding necessity of tiresome Storage Based Authorization for secured production deployments. Just having a group of trusted system users for different frameworks is enough. Re backward compatibility and having only one place of access granting - that's only an additional option. So let's keep it apart of the discussion for now. Most probably I'll have to examine the source code to strengthen my points. I'II do it in the second part of May. On May 5, 2015 7:04 PM, "Alan Gates" <alanfga...@gmail.com> wrote: > The issue is that security checks are done in the client. In order to > fully do them in the metastore we would have been forced to move > significant amounts of functionality out of the client and into the > metastore. Query parsing and planning would have had to be moved to the > metastore, basically making it HS2. A few more comments inline: > > Sergey Tryuber <stryu...@gmail.com> > May 4, 2015 at 12:56 > Hi Guys, > > My understanding is that there are two safe ways of usage of SQL Standards > Based Authorization (SSBA): > > 1. Hide Hive Metastore from the world by embedding it into HiveServer2. > *MetaStoreAuthzAPIAuthorizerEmbedOnly* configuration for Metastore is > only a half-protection since everyone can change tables-specific metadata. > 2. Have "two Metastores", but Public one should be additionally > protected by Storage Based Authorization > > Option #2 is much more demanded, since there are too many frameworks in > Hadoop ecosystem which use Hive Metastore. But necessity to keep both SQL > and HDFS ACLs in sync is an administration nightmare (especially taking > into account that "doAs" option is false in SSBA mode). > > *Why isn't it possible to add SSBA-like authorizer to Hive Metastore as > well?* The authorizer could check if a user has permissions to update > table-specific metadata according to his role and username. I could even > imagine following layout: > > 1. All the files in Hive tables can be accessed only by few system users > (hive, spark-sql, impala, etc) > > How would you accomplish this? Hive's files are stored in HDFS and thus > must work with HDFS file permissions. You could construct a group that > contained those users and make the files accessible to that group, but each > cluster admin would have to do that. > > 2. There is only a single place of granting permissions - through SQL > standards and all SQL-like frameworks around the metastore should use it > > We can't break backwards compatibility, so we could make this an option > but we couldn't enforce it. > > 3. Additional HDFS permissions configuration would be needed only for > rare cases of data access from non-impersonated execution pipelines (Spark > Core, etc) > 4. No necessity to have embedded into HiveServer2 metastore, no strange > configuration options, easier for understanding and documentation > > May be I've missed something in my understanding... So, please, point me to > my mistake in this case. > >