Hadoop does provide a ulimit based way to control the memory consumption by the tasks it spawns via the config mapred.child.ulimit. Look at http://hadoop.apache.org/core/docs/r0.17.0/mapred_tutorial.html#Task+Executi on+%26+Environment However, what is lacking is a way to get the cumulative memory consumption of all processes spawned by a map/reduce task. For e.g., a streaming process could spawn 100s of processes and they collectively can cause havoc.
> -----Original Message----- > From: Taeho Kang [mailto:[EMAIL PROTECTED] > Sent: Monday, June 16, 2008 7:23 AM > To: [EMAIL PROTECTED] > Subject: Question on HadoopStreaming and Memory Usage > > Dear All, > > I've got a question about hadoop streaming with its memory management. > Does hadoop streaming have a mechanism to prevent over-usage > of memory by its subprocesses (Map or Reduce function)? > > Say, a binary used for reduce phase allocates itself lots and > lots of memory to the point it starves other important > processes like a Datanode or TaskTracker process. Does Hadoop > Streaming prevent such cases? > > Thank you in advance, > > Taeho >
