wrapper for streaming tasks
---------------------------

                 Key: HADOOP-2765
                 URL: https://issues.apache.org/jira/browse/HADOOP-2765
             Project: Hadoop Core
          Issue Type: New Feature
          Components: contrib/streaming
    Affects Versions: 0.15.3
            Reporter: Joydeep Sen Sarma


here's the motivation:

we want to put a memory limit on user scripts to prevent runaway scripts from 
bringing down nodes. this setting is much lower than the max. memory that can 
be used (since most likely these tend to be scripting bugs). At the same time - 
for careful users, we want to be able to let them use more memory by overriding 
this limit.

there's no good way to do this. we can set ulimit in hadoop shell scripts - but 
they are very restrictive. there doesn't seem to be a way to do a setrlimit 
from Java - and setting a ulimit means that supplying a higher Xmx limit from 
the jobconf is useless (the java process will be limited by the ulimit setting 
when the tasktracker was launched).

what we have ended up doing (and i think this might help others as well) is to 
have a stream.wrapper option. the value of this option is a program through 
which streaming mapper and reducer scripts are execed. in our case, this 
wrapper is small C program to do a setrlimit and then exec of the streaming 
job. the default wrapper puts a reasonable limit on the memory usage - but 
users can easily override this wrapper (eg by invoking it with different memory 
limit argument). we can use the wrapper for other system wide resource limits 
(or any environment settings) as well in future.

This way - JVMs can stick to mapred.child.opts as the way to control memory 
usage. This setup has saved our ass on many occasions while allowing 
sophisticated users to use high memory limits.

Can submit patch if this sounds interesting.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to