John,

It's not clear to us whether, if a traditional ACL model was available, we would still need the HDFS model. I suspect so, but I'm not sure.

We had a few concerns with the full ACL model that caused us to avoid it at least initially. In this model Hive/Howl has to own all the files and set them to be 700. Otherwise someone else can go underneath and read them via HDFS. Maybe this is ok, but I wonder if it will make it harder to administer.

Our biggest concern is that HDFS already has a permissions model, why create a whole new one? It is a lot of duplication. And that duplication will flow through to things like logging and auditing, all of which Hive/Howl will now need in addition to HDFS. To justify this we needed to understand what additional benefits a traditional ACL model would get us. We were not able to come up with compelling use cases where we had to have this traditional model.

One clear issue with using HDFS is extending it to non-HDFS based tables (such as Hbase). So we should work on this being an interface that uses the underlying security (be it HDFS or Hbase or whatever).

All that said, I see no problem with having two models for now, and seeing which turns out to better provide what users need and/or be easier to maintain.

Alan.

On Oct 11, 2010, at 5:12 PM, John Sichi wrote:


Hi Pradeep,

Namit and I took a look at the doc; thanks for the clear writeup.

Coincidentally, we've been starting to think about some Hive authorization use cases within Facebook as well. However, the approach we're thinking about is more along the lines of traditional SQL ACL's (role-based GRANT/REVOKE with persistence in the metastore) rather than HDFS-based. HIVE-78 touches on this (plus a lot of unrelated stuff).

So, one question is whether you would still need HDFS-based approach if a metastore-level ACL solution were available?

And if the answer to that is no, then would you prefer to skip the HDFS-based work and just join forces on the ACL solution?

If it turns out that you're going to need the HDFS-based approach, then I can see how both can coexist (either as alternatives, or as one overlayed on top of the other). The HDFS-based approach can be useful for controlling how HDFS permissions are managed in the case where users are allowed direct access to HDFS, or when multiple clients are used for access (which is one of the main reasons for Howl to exist).

Regarding development of the HDFS-based approach, it would make sense to start off with enforcement via hooks. I think now that we have the semantic analyzer hooks, it should be possible to do it either all there or via a combination of that and execution hooks.

The code for the hook implementations can start out in Howl, and then if there's consensus on adopting it within Hive, we can move it at that time.

JVS

On Oct 5, 2010, at 1:19 PM, Pradeep Kamath wrote:



Also, if this proposal looks reasonable, it would be nice if hive would also adopt it – so comments from hive developers/committers on the feasibility would be much appreciated!

Thanks,
Pradeep

From: Pradeep Kamath
Sent: Tuesday, October 05, 2010 1:14 PM
To: 'howl...@yahoogroups.com'
Subject: Howl Authorization proposal

Hi,
I have posted a proposal for implementing authorization in howl based on hdfs file permission at http://wiki.apache.org/pig/Howl/HowlAuthorizationProposal . Please provide any comments/feedback on the proposal.

Thanks,
Pradeep




__._,_.___
Reply to sender | Reply to group | Reply via web post | Start a New Topic
Messages in this topic (3)
RECENT ACTIVITY:
        • New Members 1
Visit Your Group

Switch to: Text-Only, Daily Digest • Unsubscribe • Terms of Use
.

__,_._,___

Reply via email to