[jira] Commented: (HADOOP-3675) Provide more flexibility in the way tasks are run

Devaraj Das (JIRA) Mon, 04 Aug 2008 13:31:43 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-3675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12619687#action_12619687
 ]


Devaraj Das commented on HADOOP-3675:
-------------------------------------

Doug, I hadn't considered the thread-per-task approach. I was considering 
sequential executions of tasks (perhaps we would sometimes have more than one 
JVM for the same job in memory subject to the available free slots). The slots 
used in the thread-per-task case would be the number of concurrently running 
threads (read tasks) across all the JVMs, right? 
It does bring in a complication to do with integration with HADOOP-3581 but it 
should be possible to count how many task slots a JVM is currently using 
(number of concurrently running tasks), and, factor that in into the resource 
utilization issue that HADOOP-3581 deals with. 
The other complication is to figure out whether the framework code is clean 
enough (or threadsafe) that multiple instances of the Map/Reduce task can be 
active within one process at any given point of time. Ditto with the 
application code - can we assume that apps have been written to be thread safe.

> Provide more flexibility in the way tasks are run
> -------------------------------------------------
>
>                 Key: HADOOP-3675
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3675
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: mapred
>            Reporter: Brice Arnould
>            Assignee: Brice Arnould
>            Priority: Minor
>         Attachments: TaskWrapper_v0.patch, userBasedInsulator.sh
>
>
> *The aim*
> With [HADOOP-3421] speaking about sharing a cluster among more than one 
> organization (so potentially with non-cooperative users), and posts on the ML 
> speaking about virtualization and the ability to re-use the TaskTracker's VM 
> to run new tasks, it could be useful for admins to choose the way TaskRunners 
> run their children. 
> More specifically, it could be useful to provide a way to imprison a Task in 
> its working directory, or in a virtual machine.
> In some cases, reusing the VM might be useful, since it seems that this 
> feature is really wanted ([HADOOP-249]).
> *Concretely*
> What I propose is a new class, called called SeperateVMTaskWrapper which 
> contains the current logic for running tasks in another JVM. This class 
> extends another, called TaskWrapper, which could be inherited to provide new 
> ways of running tasks.
> As part of this issue I would also like to provide two other TaskWrappers : 
> the first would run the tasks as Thread of the TaskRunner's VM (if it is 
> possible without too much changes), the second would use a fixed pool of 
> local unix accounts to insulate tasks from each others (so potentially 
> non-cooperating users will be hable to share a cluster, as described in 
> [HADOOP-3421]).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3675) Provide more flexibility in the way tasks are run

Reply via email to