I think if the access to the files/directories (and their manipulations etc..) 
are through a single point (Hive) then this does not become an issue. However, 
if you have a usecase where direct manipulation of the files happens through 
hdfs then you do have to have 2 levels of authorization and you have to pay the 
administrative cost of potentially having these be out of sync. At the most 
basic level you could check for appropriate hdfs permissions while creating the 
more traditional permissions. However, that would not protect you to changes 
happening to the dfs permissions after you have created the Hive permissions. I 
agree, a sync utility, though possible, is perhaps too much of an overkill.

Ashish

-----Original Message-----
From: Pradeep Kamath [mailto:prade...@yahoo-inc.com] 
Sent: Wednesday, October 13, 2010 4:50 PM
To: John Sichi; <dev@hive.apache.org>
Cc: howl...@yahoogroups.com; <hive-...@hadoop.apache.org>
Subject: RE: [howldev] RE: Howl Authorization proposal

One related concern with not using hdfs permissions is that there can be 
conflicts between what the hive authorization realm would permit versus what 
hdfs would permit.

For instance a user X (in the hive authorization realm) has create table 
privilege for database db1 but the hdfs directory /user/hive/warehouse/db1 is 
actually not writable by user X - wouldn't this lead to a dfs permissions 
denied error though user X has the create privilege per hive? We can extend the 
same issue to other operations like drop table etc. 

Keep the two worlds in sync so that what is allowed/disallowed in one is the 
same in the other might be difficult - thoughts?

-----Original Message-----
From: John Sichi [mailto:jsi...@facebook.com]
Sent: Wednesday, October 13, 2010 4:36 PM
To: <dev@hive.apache.org>
Cc: howl...@yahoogroups.com; Pradeep Kamath; <hive-...@hadoop.apache.org>
Subject: Re: [howldev] RE: Howl Authorization proposal

On Oct 13, 2010, at 9:22 AM, Alan Gates wrote:

> Our biggest concern is that HDFS already has a permissions model, why create 
> a whole new one?  It is a lot of duplication.  And that duplication will flow 
> through to things like logging and auditing, all of which Hive/Howl will now 
> need in addition to HDFS.  To justify this we needed to understand what 
> additional benefits a traditional ACL model would get us.  We were not able 
> to come up with compelling use cases where we had to have this traditional 
> model.

Here are some you probably already considered, but I'm listing them for 
consideration anyway...

* table A can only be queried by roles X and Y; table B can only be queried by 
roles Y and Z; managing different groups for all the possible role combinations 
isn't very practical given large numbers of tables and roles
 
* finer-grained access control (e.g. column-level) may not be expressible in 
terms of HDFS permissions without doing things like creating dummy files 
(although in SQL, views can be used to avoid column-level permissions)

* privileges beyond read/write (e.g. delete vs update vs append)

* (Hive-specific):  GRANT/REVOKE is the standard SQL approach and requires 
ACL's (it can't be implemented in terms of HDFS permissions)

> All that said, I see no problem with having two models for now, and seeing 
> which turns out to better provide what users need and/or be easier to 
> maintain.


OK, let us know if the hooks turn out to be insufficient as the implementation 
mechanism.

JVS

Reply via email to