Harsh - I'd be inclined to think it's worse than just setting mapreduce.jobtracker.completeuserjobs.maximum - the only case this would solve is if a single user submitted 25 *large* jobs (in terms of tasks) over a single 24-hr window.
David - I'm guessing you aren't using the CapacityScheduler - that would help you with more controls, limits on jobs etc. More details here: http://hadoop.apache.org/common/docs/r1.0.3/capacity_scheduler.html In particular, look at the example config there and let us know if you need help understanding any of it. Arun On Jun 9, 2012, at 10:40 PM, Harsh J wrote: > Hey David, > > Primarily you'd need to lower down > "mapred.jobtracker.completeuserjobs.maximum" in your mapred-site.xml > to a value of < 25. I recommend using 5, if you don't need much > retention of job info per user. This will help keep the JT's live > memory usage in check and stop your crashes instead of you having to > raise your heap all the time. There's no "leak", but this config's > default of 100 causes much issues to JT that runs a lot of jobs per > day (from several users). > > Try it out and let us know! > > On Sat, Jun 9, 2012 at 12:37 AM, David Rosenstrauch <dar...@darose.net> wrote: >> We're running 0.20.2 (Cloudera cdh3u4). >> >> What configs are you referring to? >> >> Thanks, >> >> DR >> >> >> On 06/08/2012 02:59 PM, Arun C Murthy wrote: >>> >>> This shouldn't be happening at all... >>> >>> What version of hadoop are you running? Potentially you need configs to >>> protect the JT that you are missing, those should ensure your hadoop-1.x JT >>> is very reliable. >>> >>> Arun >>> >>> On Jun 8, 2012, at 8:26 AM, David Rosenstrauch wrote: >>> >>>> Our job tracker has been seizing up with Out of Memory (heap space) >>>> errors for the past 2 nights. After the first night's crash, I doubled the >>>> heap space (from the default of 1GB) to 2GB before restarting the job. >>>> After last night's crash I doubled it again to 4GB. >>>> >>>> This all seems a bit puzzling to me. I wouldn't have thought that the >>>> job tracker should require so much memory. (The NameNode, yes, but not the >>>> job tracker.) >>>> >>>> Just wondering if this behavior sounds reasonable, or if perhaps there >>>> might be a bigger problem at play here. Anyone have any thoughts on the >>>> matter? >>>> >>>> Thanks, >>>> >>>> DR >>> >>> >>> -- >>> Arun C. Murthy >>> Hortonworks Inc. >>> http://hortonworks.com/ >>> >>> >>> >> >> > > > > -- > Harsh J -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/