Re: [jira] Commented: (HADOOP-673) the task execution environment should have a current working directory that is task specific

Arkady Borkovsky Fri, 17 Nov 2006 17:03:10 -0800

+1  on Richard's comments

+1 on Dick's (MR should symlink the files that belong to the task intothe task running directory)

-10 on ../work


On Nov 17, 2006, at 4:25 PM, Richard Kasperski (JIRA) wrote:

[http://issues.apache.org/jira/browse/HADOOP-673?page=comments#action_12450936 ]
Richard Kasperski commented on HADOOP-673:
------------------------------------------
I think that it is very important that there be a way for anapplication to run in a sandbox <local>/jobcache/<jobid>/<taskid>that contains the contents of the jar file and is the current workingdirectory. How one accomplishes this is an implementation detail. Ifthe only way to do it is to unjar the archive more than once then Iguess that would have to be the solution. This saves the potential fora lot of grief. No shared files are ever modified because they aren'tactually shared. This causes more unpacking of jars but I don't reallysee that as a problem. How the jar's are copied to a node and thesubsequent reuse of the jar is important.
That is the sandbox. Then there is the shared sandbox which is alsotwo different instance to memory map a file and pay a single cost.This is best handled by either softlinks or hardlinks underunix/linux.
Why do I think these are important models? Most of the programs that Iwrite and that I use run out of the current directory and expect allof the their configuration files/resources can be read from there. Forprograms that have more sophisticated models of deployment the modelsabove are still ok. For the simpler programs the proposed externalrepository doesn't work.
OTOH I can always run a script that will hard link the files from thedirectory where the jar to my current working directory. I just don'tbelieve that this should be forced on the users. Having the users dosystem'ish things is potentially dangerous.
Even more restrictive then the above sandbox would be one in which theapplication is run chroot'd. That way there is no way it could muckwith anything system like on the nodes. This is an importantconsideration when one lets arbitrary programs to be run.
the task execution environment should have a current workingdirectory that is task specific--------------------------------------------------------------------------------------------
                Key: HADOOP-673
                URL: http://issues.apache.org/jira/browse/HADOOP-673
            Project: Hadoop
         Issue Type: Bug
         Components: mapred
   Affects Versions: 0.7.2
           Reporter: Owen O'Malley
        Assigned To: Mahadev konar
            Fix For: 0.9.0
The tasks should be run in a work directory that is specific to asingle task. In particular, I'd suggest using the<local>/jobcache/<jobid>/<taskid> as the current working directory.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of theadministrators:http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:http://www.atlassian.com/software/jira

Re: [jira] Commented: (HADOOP-673) the task execution environment should have a current working directory that is task specific

Reply via email to