Re: mapreduce does the wrong thing with dfs permissions?

s29752-hadoopdev Thu, 28 Feb 2008 13:46:00 -0800

Hi Andy,

I can reproduce the problem and I believe it is a bug.  The output directory 
should be owned by the user submitting the job, not the task tracker account.  
Do you want to file a jira?  or I can do it.

Thanks.

Nicholas

----- Original Message ----
From: Andy Li <[EMAIL PROTECTED]>
To: core-dev@hadoop.apache.org
Sent: Wednesday, February 27, 2008 1:22:26 AM
Subject: Re: mapreduce does the wrong thing with dfs permissions?

I also encountered the same problem when running the MapReduce code as a 
different user name.

For example, assuming I have installed Hadoop with an account 'hadoop' and I am 
going to run my
program with user account 'test'. I have created an input folder as 
/user/test/input/ with user 'test' and the permission is set to 0775.
/user/test/input      <dir>           2008-02-27 01:20
rwxr-xr-x       test  hadoop

When I run the MapReduce code, the output I specified will be set to user
'hadoop' instead of 'test'.
${HADOOP_HOME}/bin/hadoop jar /tmp/test_perm.jar -m 57 -r 3
"/user/test/input/l" "/user/test/output/"

The directory "/user/test/output/" will have the following permission and
user:group.
/user/test/output    <dir>           2008-02-27 03:53        rwxr-xr-x
hadoop  hadoop

My question will be - Why is the output folder set to the super user
'hadoop' ?
and of course, the MapReduce code cannot access this folder because the
permission does not allow user 'test'
to write to this folder.  So the output folder was created, but the user
account 'test' cannot write anything to this
folder and therefore threw an exception.  See the following for the
exception.

I have been looking for solution to solve this, but cannot find an exact
answer.
How do I set the default umask to 0775?  I can add the user 'test' to group
'hadoop' so
the user 'test' can have write access to the folder within 'hadoop' group.
In other word, as long as the folder is set to 'rwxrwxr-x', user 'test' can
read/write to the
folder and share the folder with 'hadoop:hadoop'.  Any idea how I can set or
modify the global
default umask for Hadoop?  or do I have to always override the default umask
value in my configuration
or FileSystem?

======= COPY/PASTE STARTS HERE =======
org.apache.hadoop.ipc.RemoteException:
org.apache.hadoop.fs.permission.AccessControlException: Permission denied:
user=test, access=WRITE,
inode="_task_200802262256_0007_r_000001_1":hadoop:hadoop:rwxr-xr-x
        at org.apache.hadoop.dfs.PermissionChecker.check(
PermissionChecker.java:173)
        at org.apache.hadoop.dfs.PermissionChecker.check(
PermissionChecker.java:154)
        at org.apache.hadoop.dfs.PermissionChecker.checkPermission(
PermissionChecker.java:102)
        at org.apache.hadoop.dfs.FSNamesystem.checkPermission(
FSNamesystem.java:4035)
        at org.apache.hadoop.dfs.FSNamesystem.checkAncestorAccess(
FSNamesystem.java:4005)
        at org.apache.hadoop.dfs.FSNamesystem.startFileInternal(
FSNamesystem.java:963)
        at org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java
:938)
        at org.apache.hadoop.dfs.NameNode.create(NameNode.java:281)
        at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(
DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:409)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:899)

        at org.apache.hadoop.ipc.Client.call(Client.java:512)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:198)
        at org.apache.hadoop.dfs.$Proxy1.create(Unknown Source)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(
NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(
DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(
RetryInvocationHandler.java:82)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(
RetryInvocationHandler.java:59)
        at org.apache.hadoop.dfs.$Proxy1.create(Unknown Source)
        at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.<init>(
DFSClient.java:1927)
        at org.apache.hadoop.dfs.DFSClient.create(DFSClient.java:382)
        at org.apache.hadoop.dfs.DistributedFileSystem.create(
DistributedFileSystem.java:135)
        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:436)
        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:336)
        at org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(
TextOutputFormat.java:116)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:308)
        at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java
:2089)
======= COPY/PASTE ENDS HERE =======

On Tue, Feb 26, 2008 at 9:47 PM, Owen O'Malley <[EMAIL PROTECTED]> wrote:

>
> On Feb 26, 2008, at 3:05 PM, Michael Bieniosek wrote:
>
> > Ah, that makes sense.
> >
> > I have things set up this way because I can't trust code that gets
> > run on the tasktrackers: we have to prevent the tasktrackers from
> > eg. sending kill signals to the datanodes.  I didn't think about
> > the jobtracker, but I suppose I should equally not trust code that
> > gets run on the jobtracker...
>
> Just to be clear, no user code is run in the JobTracker or
> TaskTracker. User code is only run in the client and task processes.
> However, it makes a lot of sense to run map/reduce as a different
> user than hdfs to prevent the task processes from having access to
> the raw blocks or datanodes.
>
> -- Owen
>

Re: mapreduce does the wrong thing with dfs permissions?

Reply via email to