[
https://issues.apache.org/jira/browse/HADOOP-3675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Brice Arnould updated HADOOP-3675:
----------------------------------
Attachment: userBasedInsulator.sh
[EMAIL PROTECTED] Cutting*
Glad you like it ^^
I don't think that chroot alone are a good idea, because tasks inside a chroot
can still kill or ptrace tasks outside of the chroot. So it would not protect
users from each other. FreeBSD jails or Linux vservers would be perfect, but
they are non-standard and vservers require to patch the kernel.
My intention is to provide a TaskWrapper that delegate the security to a user
script, and to write a script (which could go in contrib/ ?) that would use a
pool of local Unix accounts to run user tasks.
So, if we have two users (Alice and Bob) whose tasks are going to be run on the
same tasktracker, Alice's tasks will be run as the Unix user {{hadoop0}} and
Bob's tasks as {{hadoop1}}.
When Alice's tasks are done, her files and process are killed atomically (via
{{kill -PGROUP}}) to ensure there's nothing left. Then {{hadoop0}} is made
available for use by another Hadoop user.
The benefit of using a separate shell script is that only this script (not the
whole TaskTracker) needs root privileges. And it can get them via sudo (so we
don't require yet another SUID binary).
An administrator wanting to use this would :
# Deploy hadoop as usual
# Create Unix accounts {{hadoopUser0}}...{{hadoopUserN}} for use by this wrapper
# Add in {{/etc/sudoers}} a permission for the hadoop user to run the wrapper
script as root
# Set the right wrapper in Hadoop config
The attached script demonstrate the process. If there is shell guru available,
I would really like his advices ^^.
We could also write a script that run tasks inside a VM, but I'm unsure that it
is useful, considering the overhead.
bq. If the TaskWrapper implementation is passed the Configuration, can't this
property continue to be used by SeperateVMTaskWrapper?
You're right, it would be the best way to ensure compatibility. For now I will
continue to use parameters set by setMaximumMemory(), addArg() and so on, in
order to test the API. But the "release" version of SeperateVMTaskWrapper will
directly use mapred.child.java.opts.
[EMAIL PROTECTED] Loughran*
bq. An in-VM task runner should always run the task in a new security manager
Good idea ! For some TaskWrapper we might be unhable to provide a true security
(I think mainly to the ThreadWrapper that I'm writing, which is mainly intended
to be used with the Streaming API), but the programmer should at least be
protected against most obvious errors.
On the ML, Alejandro Abdelnur said that he already run his tasks under a
security manager, I'm going to ask him if he can publish more information that
I could integrate into a TaskWrapper.
You're also right about the fact that failing gracefully is very important,
since some TaskWrapper might not be able to run all tasks. My next proposition
will try to take that in account.
Thanks for your comments !
> Provide more flexibility in the way tasks are run
> -------------------------------------------------
>
> Key: HADOOP-3675
> URL: https://issues.apache.org/jira/browse/HADOOP-3675
> Project: Hadoop Core
> Issue Type: New Feature
> Components: mapred
> Reporter: Brice Arnould
> Assignee: Brice Arnould
> Priority: Minor
> Attachments: TaskWrapper_v0.patch, userBasedInsulator.sh
>
>
> *The aim*
> With [HADOOP-3421] speaking about sharing a cluster among more than one
> organization (so potentially with non-cooperative users), and posts on the ML
> speaking about virtualization and the ability to re-use the TaskTracker's VM
> to run new tasks, it could be useful for admins to choose the way TaskRunners
> run their children.
> More specifically, it could be useful to provide a way to imprison a Task in
> its working directory, or in a virtual machine.
> In some cases, reusing the VM might be useful, since it seems that this
> feature is really wanted ([HADOOP-249]).
> *Concretely*
> What I propose is a new class, called called SeperateVMTaskWrapper which
> contains the current logic for running tasks in another JVM. This class
> extends another, called TaskWrapper, which could be inherited to provide new
> ways of running tasks.
> As part of this issue I would also like to provide two other TaskWrappers :
> the first would run the tasks as Thread of the TaskRunner's VM (if it is
> possible without too much changes), the second would use a fixed pool of
> local unix accounts to insulate tasks from each others (so potentially
> non-cooperating users will be hable to share a cluster, as described in
> [HADOOP-3421]).
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.