Re: Fwd: hadoop file permissions

Kurtis Heimerl Thu, 19 Apr 2007 13:21:15 -0700

Some other notes/questions:

On 4/19/07, Kurtis Heimerl <[EMAIL PROTECTED]> wrote:

On 4/19/07, Doug Cutting <[EMAIL PROTECTED]> wrote:
>
> Kurtis Heimerl wrote:
> >> Yes, DFSClient will need to pass the user to the namenode.
> >>
> >> Perhaps the username should be put in the FileSystem's URI.  So an
> HDFS
> >> URI would become hdfs://[EMAIL PROTECTED]:5555/foo/bar.  URI's without a
> >> username would have "other" access (typically read-only).
> >
> > That's reasonable. I don't know how kerberos plays with that though.
>
> I chatted with Owen a bit yesterday about this and think it's better to
> keep the username in the config.  A FileSystem is created given a URI
> and a Configuration.  FileSystem's are currently cached, keyed on the
> URI's protocol and authority (host & port, typically).  We should add
> the configuration to the cache key too, so that different FileSystem
> instances are used for different users.  That permits FileSystem
> implementations to use arbitrary config properties in their ctor.
>
> I think we should be able to put a Kerberos ticket into the
> configuration.

I think i'm understanding the plan here. NameNode.java reads the location
of the namenode instance from config. So, we'll inset username and groups
into the config. On the first iteration, this will not be authenticated.
This information will be passed to the namenode server, who will translate
the name and groups to UID and GID, which are stored with the files.

Sounds like a reasonable thing. There's one problem here, that being that
each user will require their own config file. This is not the way I've seen
hadoop currently run, but if we all agree that this is the way to go, I'll
begin a prototype very soon.



ok, I have an architectural question. I think I get the client-side stack.
DFSClient creates a proxy, which connects to the namenode. This all uses
ClientProtocol. So, to implement what I need I'll probably need to modify
ClientProtocol and NameNode.

Now we have the whole DistributedFileSystem and FileSystem stuff. I see the
cache in FileSystem, I just don't see where in the stack this is. It's
server-side I assume. I see where we instantiate the NameNode on the server,
but it seemingly just deals with blocks. Where's the filesystem at?

We should have an equivalent of /etc/groups in the namenode.

> >
> > So, what I think it does is that it validates that the user really is
> > [EMAIL PROTECTED] [ ... ]
> >
> > There's a chance kerberos actually validates that it's [EMAIL PROTECTED]
> .
>
> Kerberos validates that a user is [EMAIL PROTECTED], where both the user and
> the domain are part of Kerberos, not some host.  Initially we'll not do
> any user validation, but just trust the username sent.



There's accountability, but not great protection. If someone put their
client into kerberos and it was accepted, they could take any role they
wanted.

That is, if I understood what you are talking about.


We might be able to get away without groups, but it would be awkward.
> For example, if the default file permission is -rw-rw-r-, then, without
> groups, anyone can read any file, but folks can only remove files
> they've created.  That doesn't permit read/write sharing of data w/o
> changing its owner.
>
> We probably also need a "root" username that can do anything.


I think groups and root are easy, so I plan to implement those initially
as well. Is there any more reasonable way to do root than just hardcoding
that root can do anything? I thought about adding root to all groups, but
there's a chance that a file had no groups. I guess I could add one root
group that simply contains root. That would allow the service to allow
others to run as root as well.


Doug
>

Re: Fwd: hadoop file permissions

Reply via email to