[jira] Commented: (HADOOP-673) the task execution environment should have a current working directory that is task specific

Richard Kasperski (JIRA) Fri, 17 Nov 2006 16:26:00 -0800

    [ 
http://issues.apache.org/jira/browse/HADOOP-673?page=comments#action_12450936 ] 
            
Richard Kasperski commented on HADOOP-673:
------------------------------------------


I think that it is very important that there be a way for an application to run 
in a sandbox <local>/jobcache/<jobid>/<taskid>  that contains the contents of 
the jar file and is the current working directory. How one accomplishes this is 
an implementation detail. If the only way to do it is to unjar the archive more 
than once then I guess that would have to be the solution. This saves the 
potential for a lot of grief. No shared files are ever modified because they 
aren't actually shared. This causes more unpacking of jars but I don't really 
see that as a problem. How the jar's are copied to a node and the  subsequent 
reuse of the jar is important. 

That is the sandbox. Then there is the shared sandbox which is also two 
different instance to memory map a file and pay a single cost. This is best 
handled by either softlinks or hardlinks under unix/linux. 

Why do I think these are important models? Most of the programs that I write 
and that I use run out of the current directory and expect all of the their 
configuration files/resources can be read from there. For programs that have 
more sophisticated models of deployment the models above are still ok. For the 
simpler programs the proposed external repository doesn't work.

OTOH I can always run a script that will hard link the files from the directory 
where the jar to my current working directory. I just don't believe that this 
should be forced on the users. Having the users do system'ish things is 
potentially dangerous.

Even more restrictive then the above sandbox would be one in which the 
application is run chroot'd. That way there is no way it could muck with 
anything system like on the nodes. This is an important consideration when one 
lets arbitrary programs to be run. 

> the task execution environment should have a current working directory that 
> is task specific
> --------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-673
>                 URL: http://issues.apache.org/jira/browse/HADOOP-673
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.7.2
>            Reporter: Owen O'Malley
>         Assigned To: Mahadev konar
>             Fix For: 0.9.0
>
>
> The tasks should be run in a work directory that is specific to a single 
> task. In particular, I'd suggest using the <local>/jobcache/<jobid>/<taskid> 
> as the current working directory.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HADOOP-673) the task execution environment should have a current working directory that is task specific

Reply via email to