[jira] Commented: (HADOOP-2915) mapred output files and directories should be created as the job submitter, not tasktracker or jobtracker

Tsz Wo (Nicholas), SZE (JIRA) Thu, 28 Feb 2008 18:11:53 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12573563#action_12573563
 ]


Tsz Wo (Nicholas), SZE commented on HADOOP-2915:
------------------------------------------------

This problem is due to FileSystem.CACHE, which  stores FileSystem objects 
accroding to the schemes and authorities, but not user information.  

Suppose user foo has called FileSystem.getFileSystem(...) with some uri U.
User bar calls getFileSystem(...) with another uri V, where the schemes and 
authorities of U and V are the same.
Then, getFileSystem(...) returns the FileSystem with foo's account to bar.

In this problem, JobTracker has called getFileSystem(...) at first.  Then, some 
task calls getFileSystem(...).  It turn out gets a FileSystem with JobTracker's 
account.
The user information of the task stored in jobConf is not used because of 
FileSystem.CACHE.

Possible solutions:
- In addition of schemes and authorities, add user information to the keys of 
FileSystem.CACHE.

- A cached FileSystem object will be removed from FileSystem.CACHE when calling 
FileSystem.close().  So, we close each FileSystem object before another 
FileSystem is opened.  However, this might affect performance.

I guess the first one is better.


> mapred output files and directories should be created as the job submitter, 
> not tasktracker or jobtracker
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-2915
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2915
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.16.0
>            Reporter: Tsz Wo (Nicholas), SZE
>            Assignee: Tsz Wo (Nicholas), SZE
>
> Quoted from an email sending to core-dev by Andy Li:
> {quote}
> For example, assuming I have installed Hadoop with an account 'hadoop' and I 
> am going to run my program with user account 'test'. I have created an input 
> folder as /user/test/input/ with user 'test' and the permission is set to 
> 0775.
> /user/test/input      <dir>          2008-02-27 01:20 rwxr-xr-x      test  
> hadoop
> When I run the MapReduce code, the output I specified will be set to user 
> 'hadoop' instead of 'test'.
> /bin/hadoop jar /tmp/test_perm.jar -m 57 -r 3 "/user/test/input/l" 
> "/user/test/output/"
> The directory "/user/test/output/" will have the following permission and 
> user:group.
> /user/test/output    <dir>          2008-02-27 03:53        rwxr-xr-x hadoop  
> hadoop
> {quote}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2915) mapred output files and directories should be created as the job submitter, not tasktracker or jobtracker

Reply via email to