Fwd: hadoop file permissions

Kurtis Heimerl Wed, 18 Apr 2007 16:58:33 -0700

Hello hadoop mailing list.

I'm an intern at a software company somewhere that's been tasked with adding
file permissions to hadoop. I've begun a discussion with Doug Cutting about
how to accomplish that, and he suggested that I move it to the mailing list.

So here it is. If you have any suggestions about reasonable ways to
implement this, feel free to chime in.

Excuse the poor formatting as well, I had to add some stuff back in for
completeness.

Date: Apr 18, 2007 4:48 PM
Subject: Re: hadoop file permissions
To: Doug Cutting <[EMAIL PROTECTED]>
Cc: [EMAIL PROTECTED]

Comments in line:

On 4/18/07, Doug Cutting <[EMAIL PROTECTED]> wrote:


Kurtis Heimerl wrote:

So I thought I'd throw my rough design idea in front of you as soon as
possible. Once we decide it's ballpark, I'll push it to the general
community.

>So, I see this split into two separate problems. First is the
authorization. I agree that kerberos is the way to do that. This will
authorize a subject, allowing us to get their user name.

>Following this, we have the problem of securing the system. The way I
understand that it should work is that we take the user name discovered
above and look up the UID and GID for that user on the local machine. We
then store this with the file, probably adding metadata to the namenode.

So, my plan is to implement the second part, with us assuming that
> whatever user name the client sends is valid. I'll leave the
> authentication of that until I've completed the FS work. Assuming I have

> time, i'll then set up the kerberos part. The discussions I've had with
> people indicate that it's an extremely difficult problem.

That sounds like a fine approach to me.



Good.

The split seems to happen at DFSClient.java. It's there that we actually
> call the namenode, seemingly via RPC calls. I'll modify this to send the
> $USERNAME variable for now, and then set up the file system to use that
> information.

Yes, DFSClient will need to pass the user to the namenode.

Perhaps the username should be put in the FileSystem's URI.  So an HDFS
URI would become hdfs://[EMAIL PROTECTED]:5555/foo/bar.  URI's without a
username would have "other" access (typically read-only).




That's reasonable. I don't know how kerberos plays with that though.

This will require all people making calls to namenode to have accounts
> on the namenode box.

No, since we're not checking usernames in the client (anyone can set
that environment variable) there's no reason to validate them
server-side either, is there?

We should have an equivalent of /etc/groups in the namenode.



Well, it's my understanding that kerberos sends you more than the username,
it sends the level of privilege you're currently at. So, if you could change
your UID, then you could run as someone else. however, that's totally
reasonable. it's not simply changing your $USER environment variable.

So, what I think it does is that it validates that the user really is
[EMAIL PROTECTED] This is the information we get from kerberos. The idea was
to take this information and map it to a hadoop user somehow. The obvious
way to me was to look up user on our own machine, but now i realize that is
a flawed system.

There's a chance kerberos actually validates that it's [EMAIL PROTECTED] If
that's the case, then every user will require an account on the server. It
would really simplify the design though, as we could just use user as the ID
in hadoop.

This is what i'm looking into, and I haven't made much progress today.

Also, I'm not entirely sure how to get the UID and
> GID from the kernel.

We shouldn't need to.  HDFS can have its own UID and and GID database,
or simply use strings everywhere.  That's a namenode implementation
detail.  For example, there may be no persistent UIDs or GIDs.  We might
use ints in memory to save space, and use these to index tables of
strings, but always record the strings when persisting namenode data.

Finally, it would be good to move this discussion to the mailing list or
Jira sooner rather than later.




I'll CC my mailing list account and then forward it there.


Doug

Fwd: hadoop file permissions

Reply via email to