Best practices with large-memory jobs

Chris Dyer Mon, 14 Sep 2009 22:43:18 -0700

Hello Hadoopers-
I'm attempting to run some large-memory map tasks with using hadoop
streaming, but I seem to be running afoul of the mapred.child.ulimit
restriction, which is set to 2097152.  I assume this is in KB since my
tasks fail when they get to about 2GB (I just need to get to about
2.3GB- almost there!).  So far, nothing I've tried has succeeded in
changing this value.   I've attempted to add
-jobconf mapred.child.ulimt=3000000
to the streaming command line, but to no avail.  In the job's xml file
that I find in my logs, it's still got the old value.  And worse, in
my task logs I see the message:
"attempt to override final parameter: mapred.child.ulimit;  Ignoring."
which doesn't exactly inspire confidence that I'm on the right path.


I see there's been a fair amount of traffic on Jira about large memory
jobs, but there doesn't seem to be much in the way of examples or
documentation.  Can someone tell me how to run such a job, especially
a streaming job?

Many thanks in advance--
Chris
ps. I'm running an 18.3 cluster on Amazon EC2 (I've been using the
Cloudera convenience scripts, but I can abandon this if I need more
control).  The instances have plenty of memory (7.5GB).

Best practices with large-memory jobs

Reply via email to