[ 
http://issues.apache.org/jira/browse/HADOOP-673?page=comments#action_12451359 ] 
            
Richard Kasperski commented on HADOOP-673:
------------------------------------------

Owen,
  There are two primary reasons. 
1. Maybe I don't have access to the source of all programs that I might want to 
run, or I have the source and don't desire to make any changes. Having the data 
for an application be rooted in the current directory allows many more programs 
to be run with no changes.
2. I think of hadoop as an infrastructure to run other progams. not as 
something to write programs too. This means that I do not want to write into my 
progams hadoop'isms. 

Running streaming apps on hadoop should be as transparent as possible.

Files being a shared resource is an efficiency consideration not one required 
for proper operation. Is there an operational requirement for the files being 
shared such as I need to mmap the same file? Is this likely the way that most 
programs will be. 

It's not that I'm absolutely against the .. structure, just that it seems to me 
not to be the most useful structure. I would contend that if you desire the .. 
placement you need also, some how, provide for placement in the current working 
directory.

The above comments only apply to streaming. 

A more general comment, perhaps philosophical, perhaps religious.

The purpose of hadoop is to allow work to be divided and then executed with a 
belief of correctness that is the same as would have if I ran it as a single 
job on a single machine. Sharing data in a way that allows 
inadvertant/unintended manipulation of 'SHARED' data violates this. It seems to 
me that this should be used when efficiency is needed and the user can take an 
appropriate explicit action to enable it. 




> the task execution environment should have a current working directory that 
> is task specific
> --------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-673
>                 URL: http://issues.apache.org/jira/browse/HADOOP-673
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.7.2
>            Reporter: Owen O'Malley
>         Assigned To: Mahadev konar
>             Fix For: 0.9.0
>
>
> The tasks should be run in a work directory that is specific to a single 
> task. In particular, I'd suggest using the <local>/jobcache/<jobid>/<taskid> 
> as the current working directory.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to