I'm having similar performance issues and have been running my Hadoop processes using a nice level of 10 for a while, and haven't noticed any improvement.
In my case, I believe what's happening is that the peak combined RAM usage of all the Hadoop task processes and the service processes exceeeds the ammount of RAM on my machines. This in turn causes part of the server processes to get paged out to disk while the nightly Hadoop batch processes are running. Since the swap space is typically on the same physical disks as the DFS and MapReduce working directory, I'm heavily IO bound and real time queries pretty much slow down to a crawl. I think the key is to make absolutely sure that all of your processes fit in your available RAM at all times. I'm actually having a hard time achieving this since the virtual memory usage of the JVM is usually way higher than the maximum heap size (see my other thread). -- Stefan > From: zsongbo <zson...@gmail.com> > Reply-To: <core-user@hadoop.apache.org> > Date: Tue, 12 May 2009 10:58:49 +0800 > To: <core-user@hadoop.apache.org> > Subject: Re: How to do load control of MapReduce > > Thanks Billy,I am trying 'nice', and will report the result later. > > On Tue, May 12, 2009 at 3:42 AM, Billy Pearson > <sa...@pearsonwholesale.com>wrote: > >> Might try setting the tasktrackers linux nice level to say 5 or 10 >> leavening dfs and hbase setting to 0 >> >> Billy >> "zsongbo" <zson...@gmail.com> wrote in message >> news:fa03480d0905110549j7f09be13qd434ca41c9f84...@mail.gmail.com... >> >> Hi all, >>> Now, if we have a large dataset to process by MapReduce. The MapReduce >>> will >>> take machine resources as many as possible. >>> >>> So when one such a big MapReduce job are running, the cluster would become >>> very busy and almost cannot do anything else. >>> >>> For example, we have a HDFS+MapReduc+HBase cluster. >>> There are a large dataset in HDFS to be processed by MapReduce >>> periodically, >>> the workload is CPU and I/O heavy. And the cluster also provide other >>> service for query (query HBase and read files in HDFS). So, when the job >>> is >>> running, the query latency will become very long. >>> >>> Since the MapReduce job is not time sensitive, I want to control the load >>> of >>> MapReduce. Do you have some advices ? >>> >>> Thanks in advance. >>> Schubert >>> >>> >> >>