Might try setting the tasktrackers linux nice level to say 5 or 10 leavening
dfs and hbase setting to 0
Billy
"zsongbo" <zson...@gmail.com> wrote in message
news:fa03480d0905110549j7f09be13qd434ca41c9f84...@mail.gmail.com...
Hi all,
Now, if we have a large dataset to process by MapReduce. The MapReduce
will
take machine resources as many as possible.
So when one such a big MapReduce job are running, the cluster would become
very busy and almost cannot do anything else.
For example, we have a HDFS+MapReduc+HBase cluster.
There are a large dataset in HDFS to be processed by MapReduce
periodically,
the workload is CPU and I/O heavy. And the cluster also provide other
service for query (query HBase and read files in HDFS). So, when the job
is
running, the query latency will become very long.
Since the MapReduce job is not time sensitive, I want to control the load
of
MapReduce. Do you have some advices ?
Thanks in advance.
Schubert