Re: set number of map tasks for HBase MR

Andriy Kolyadenko Sun, 11 Apr 2010 14:22:08 -0700

My table is about 8g, and according to logs has about 70 regions.

I don't understand the logic, but scan.setCaching(1000) increased the number of 
map tasks from 2 to 65 and increased performance significantly, thanks for hint!




--- [email protected] wrote:

From: Jean-Daniel Cryans <[email protected]>
To: [email protected], [email protected]
Subject: Re: set number of map tasks for HBase MR
Date: Sun, 11 Apr 2010 09:14:35 +0000

A map against a HBase table by default cannot have more tasks than the
number of regions in that table.

Also you want to enable scanner caching. Pass a Scan object to the
TableMapReduceUtil.initTableMapperJob that is configured with
scan.setCaching(some_value) where the value should be the number of
rows to fetch every time we hit a region server with next(). On rows
of 100-200 bytes, our jobs usually are configured with 1000 up to
10000.

Finally, is your job running in local mode or on a job tracker? Even
if HBase uses HDFS, it usually doesn't know of the job tracker unless
you configure HBase's classpath with Hadoop's conf.

J-D

On Sun, Apr 11, 2010 at 3:17 AM, Andriy Kolyadenko
<[email protected]> wrote:
> Hi,
>
> thanks for quick response. I tried to do following in the code:
>
> job.getConfiguration().setInt("mapred.map.tasks", 10000);
>
> but unfortunately have the same result.
>
> Any other ideas?
>
> --- [email protected] wrote:
>
> From: Amandeep Khurana <[email protected]>
> To: [email protected], [email protected]
> Subject: Re: set number of map tasks for HBase MR
> Date: Sat, 10 Apr 2010 20:04:18 -0700
>
> You can set the number of map tasks in your job config to a big number (eg:
> 100000), and the library will automatically spawn one map task per region.
>
> -ak
>
>
> Amandeep Khurana
> Computer Science Graduate Student
> University of California, Santa Cruz
>
>
> On Sat, Apr 10, 2010 at 7:59 PM, Andriy Kolyadenko <
> [email protected]> wrote:
>
>> Hi guys,
>>
>> I have about 8G Hbase table  and I want to run MR job against it. It works
>> extremely slow in my case. One thing I noticed is that job runs only 2 map
>> tasks. Is it any way to setup bigger number of map tasks? I sow some method
>> in mapred package, but can't find anything like this in new mapreduce
>> package.
>>
>> I run my MR job one a single machine in cluster mode.
>>
>>
>> _____________________________________________________________
>> Sign up for your free SaturnFans email account at
>> http://webmail.saturnfans.com/
>>
>
>
>
>
> _____________________________________________________________
> Sign up for your free SaturnFans email account at 
> http://webmail.saturnfans.com/
>




_____________________________________________________________
Sign up for your free SaturnFans email account at http://webmail.saturnfans.com/

Re: set number of map tasks for HBase MR

Reply via email to