Thanks, just go ahead and file a JIRA. I have no experience how to file a JIRA. (will read the manual in the future to learn how-to).
I'll keep track of this and apply the patch later on. Thanks for taking care of this issue. On Thu, Feb 28, 2008 at 2:32 PM, <[EMAIL PROTECTED]> wrote: > Just have created HADOOP-2915. > > > ----- Original Message ---- > From: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> > To: core-dev@hadoop.apache.org > Sent: Thursday, February 28, 2008 1:45:28 PM > Subject: Re: mapreduce does the wrong thing with dfs permissions? > > Hi Andy, > > I can reproduce the problem and I believe it is a bug. The output > directory should be owned by the user submitting the job, not the task > tracker account. Do you want to file a jira? or I can do it. > > Thanks. > > Nicholas > > > ----- Original Message ---- > From: Andy Li <[EMAIL PROTECTED]> > To: core-dev@hadoop.apache.org > Sent: Wednesday, February 27, 2008 1:22:26 AM > Subject: Re: mapreduce does the wrong thing with dfs permissions? > > I also encountered the same problem when running the MapReduce code as a > different user name. > > For example, assuming I have installed Hadoop with an account 'hadoop' and > I am going to run my > program with user account 'test'. I have created an input folder as > /user/test/input/ with user 'test' and the permission is set to 0775. > /user/test/input <dir> 2008-02-27 01:20 > rwxr-xr-x test hadoop > > When I run the MapReduce code, the output I specified will be set to user > 'hadoop' instead of 'test'. > ${HADOOP_HOME}/bin/hadoop jar /tmp/test_perm.jar -m 57 -r 3 > "/user/test/input/l" "/user/test/output/" > > The directory "/user/test/output/" will have the following permission and > user:group. > /user/test/output <dir> 2008-02-27 03:53 rwxr-xr-x > hadoop hadoop > > My question will be - Why is the output folder set to the super user > 'hadoop' ? > and of course, the MapReduce code cannot access this folder because the > permission does not allow user 'test' > to write to this folder. So the output folder was created, but the user > account 'test' cannot write anything to this > folder and therefore threw an exception. See the following for the > exception. > > I have been looking for solution to solve this, but cannot find an exact > answer. > How do I set the default umask to 0775? I can add the user 'test' to > group > 'hadoop' so > the user 'test' can have write access to the folder within 'hadoop' group. > In other word, as long as the folder is set to 'rwxrwxr-x', user 'test' > can > read/write to the > folder and share the folder with 'hadoop:hadoop'. Any idea how I can set > or > modify the global > default umask for Hadoop? or do I have to always override the default > umask > value in my configuration > or FileSystem? > > ======= COPY/PASTE STARTS HERE ======= > org.apache.hadoop.ipc.RemoteException: > org.apache.hadoop.fs.permission.AccessControlException: Permission denied: > user=test, access=WRITE, > inode="_task_200802262256_0007_r_000001_1":hadoop:hadoop:rwxr-xr-x > at org.apache.hadoop.dfs.PermissionChecker.check( > PermissionChecker.java:173) > at org.apache.hadoop.dfs.PermissionChecker.check( > PermissionChecker.java:154) > at org.apache.hadoop.dfs.PermissionChecker.checkPermission( > PermissionChecker.java:102) > at org.apache.hadoop.dfs.FSNamesystem.checkPermission( > FSNamesystem.java:4035) > at org.apache.hadoop.dfs.FSNamesystem.checkAncestorAccess( > FSNamesystem.java:4005) > at org.apache.hadoop.dfs.FSNamesystem.startFileInternal( > FSNamesystem.java:963) > at org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java > :938) > at org.apache.hadoop.dfs.NameNode.create(NameNode.java:281) > at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source) > at sun.reflect.DelegatingMethodAccessorImpl.invoke( > DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:409) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:899) > > at org.apache.hadoop.ipc.Client.call(Client.java:512) > at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:198) > at org.apache.hadoop.dfs.$Proxy1.create(Unknown Source) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke( > NativeMethodAccessorImpl.java:39) > at sun.reflect.DelegatingMethodAccessorImpl.invoke( > DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod( > RetryInvocationHandler.java:82) > at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke( > RetryInvocationHandler.java:59) > at org.apache.hadoop.dfs.$Proxy1.create(Unknown Source) > at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.<init>( > DFSClient.java:1927) > at org.apache.hadoop.dfs.DFSClient.create(DFSClient.java:382) > at org.apache.hadoop.dfs.DistributedFileSystem.create( > DistributedFileSystem.java:135) > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:436) > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:336) > at org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter( > TextOutputFormat.java:116) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:308) > at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java > :2089) > ======= COPY/PASTE ENDS HERE ======= > > > > On Tue, Feb 26, 2008 at 9:47 PM, Owen O'Malley <[EMAIL PROTECTED]> wrote: > > > > > On Feb 26, 2008, at 3:05 PM, Michael Bieniosek wrote: > > > > > Ah, that makes sense. > > > > > > I have things set up this way because I can't trust code that gets > > > run on the tasktrackers: we have to prevent the tasktrackers from > > > eg. sending kill signals to the datanodes. I didn't think about > > > the jobtracker, but I suppose I should equally not trust code that > > > gets run on the jobtracker... > > > > Just to be clear, no user code is run in the JobTracker or > > TaskTracker. User code is only run in the client and task processes. > > However, it makes a lot of sense to run map/reduce as a different > > user than hdfs to prevent the task processes from having access to > > the raw blocks or datanodes. > > > > -- Owen > > > > > > > > >