mapreduce does the wrong thing with dfs permissions?

Michael Bieniosek Mon, 25 Feb 2008 16:28:44 -0800

Hi,

I upgraded the cluster to hadoop-0.16 and immediately noticed that nothing
worked because of the new dfs permissions.


Here's our situation:

- namenode, jobtracker, datanodes run as UNIX 'hadoop' account.
- jobs are submitted to hadoop by rpc front-end running as 'hadoop' user
- tasktrackers run as a limited 'mapreduce' account.

When the tasktrackers try to read/write DFS, they hit a permission error,
because they are running as 'mapreduce' user, but everything is owned by
'hadoop' user.

Eventually I'd like to get to a scenario where our rpc front-end tells
hadoop the real name of the submitting user, which is not the same as the
UNIX account that the rpc front-end uses.

I think the problem here is that the dfs assumes that the DFS user should be
the same as the UNIX user.  This won't work in clusters where different UNIX
accounts can submit mapreduce jobs, because the only way to change the UNIX
account is to run the tasktracker as root to begin with (which I don't want
to do).

So, I can obviously turn off permissions, or set up the supergroup so that
they don't matter.  But permissions seem like a good idea, so it's a shame
to have to do this.

Instead, I'd like a model where the hadoop user for purposes of DFS
permissions is completely separated from the UNIX account.  So, when I
submit a job, I should be able to specify the hadoop user for the job.  The
tasktrackers, which still have to run as the limited 'mapreduce' UNIX
account, can remember to make all their DFS accesses as the hadoop user I
specified when I started the mapreduce job.

Or am I missing something?

Thanks,
Michael

mapreduce does the wrong thing with dfs permissions?

Reply via email to