[ 
https://issues.apache.org/jira/browse/HADOOP-3675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brice Arnould updated HADOOP-3675:
----------------------------------

    Attachment: userBasedInsulator.sh

[EMAIL PROTECTED] Cutting*
Glad you like it ^^

I don't think that chroot alone are a good idea, because tasks inside a chroot 
can still kill or ptrace tasks outside of the chroot. So it would not protect 
users from each other. FreeBSD jails or Linux vservers would be perfect, but 
they are non-standard and vservers require to patch the kernel.
My intention is to provide a TaskWrapper that delegate the security to a user 
script, and to write a script (which could go in contrib/ ?) that would use a 
pool of local Unix accounts to run user tasks.
So, if we have two users (Alice and Bob) whose tasks are going to be run on the 
same tasktracker, Alice's tasks will be run as the Unix user {{hadoop0}} and 
Bob's tasks as {{hadoop1}}.
When Alice's tasks are done, her files and process are killed atomically (via 
{{kill -PGROUP}}) to ensure there's nothing left. Then {{hadoop0}} is made 
available for use by another Hadoop user.
The benefit of using a separate shell script is that only this script (not the 
whole TaskTracker) needs root privileges. And it can get them via sudo (so we 
don't require yet another SUID binary).
An administrator wanting to use this would :
# Deploy hadoop as usual
# Create Unix accounts {{hadoopUser0}}...{{hadoopUserN}} for use by this wrapper
# Add in {{/etc/sudoers}} a permission for the hadoop user to run the wrapper 
script as root
# Set the right wrapper in Hadoop config

The attached script demonstrate the process. If there is shell guru available, 
I would really like his advices ^^.
We could also write a script that run tasks inside a VM, but I'm unsure that it 
is useful, considering the overhead.

bq. If the TaskWrapper implementation is passed the Configuration, can't this 
property continue to be used by SeperateVMTaskWrapper?
You're right, it would be the best way to ensure compatibility. For now I will 
continue to use parameters set by setMaximumMemory(), addArg() and so on, in 
order to test the API. But the "release" version of SeperateVMTaskWrapper will 
directly use mapred.child.java.opts.

[EMAIL PROTECTED] Loughran*
bq. An in-VM task runner should always run the task in a new security manager 
Good idea ! For some TaskWrapper we might be unhable to provide a true security 
(I think mainly to the ThreadWrapper that I'm writing, which is mainly intended 
to be used with the Streaming API), but the programmer should at least be 
protected against most obvious errors.
On the ML, Alejandro Abdelnur said that he already run his tasks under a 
security manager, I'm going to ask him if he can publish more information that 
I could integrate into a TaskWrapper.
You're also right about the fact that failing gracefully is very important, 
since some TaskWrapper might not be able to run all tasks. My next proposition 
will try to take that in account.


Thanks for your comments !

> Provide more flexibility in the way tasks are run
> -------------------------------------------------
>
>                 Key: HADOOP-3675
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3675
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: mapred
>            Reporter: Brice Arnould
>            Assignee: Brice Arnould
>            Priority: Minor
>         Attachments: TaskWrapper_v0.patch, userBasedInsulator.sh
>
>
> *The aim*
> With [HADOOP-3421] speaking about sharing a cluster among more than one 
> organization (so potentially with non-cooperative users), and posts on the ML 
> speaking about virtualization and the ability to re-use the TaskTracker's VM 
> to run new tasks, it could be useful for admins to choose the way TaskRunners 
> run their children. 
> More specifically, it could be useful to provide a way to imprison a Task in 
> its working directory, or in a virtual machine.
> In some cases, reusing the VM might be useful, since it seems that this 
> feature is really wanted ([HADOOP-249]).
> *Concretely*
> What I propose is a new class, called called SeperateVMTaskWrapper which 
> contains the current logic for running tasks in another JVM. This class 
> extends another, called TaskWrapper, which could be inherited to provide new 
> ways of running tasks.
> As part of this issue I would also like to provide two other TaskWrappers : 
> the first would run the tasks as Thread of the TaskRunner's VM (if it is 
> possible without too much changes), the second would use a fixed pool of 
> local unix accounts to insulate tasks from each others (so potentially 
> non-cooperating users will be hable to share a cluster, as described in 
> [HADOOP-3421]).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to