WRT your higher number of mappers, you probably did something else
that corrected it (for example, you restarted mapreduce and it took
into account a change you made to the configurations). The scanner
caching is totally unrelated to mapreduce.

Glad to hear it's much faster.

J-D

On Sun, Apr 11, 2010 at 11:21 PM, Andriy Kolyadenko
<cryp...@mail.saturnfans.com> wrote:
> My table is about 8g, and according to logs has about 70 regions.
>
> I don't understand the logic, but scan.setCaching(1000) increased the number 
> of map tasks from 2 to 65 and increased performance significantly, thanks for 
> hint!
>
>
>
> --- jdcry...@apache.org wrote:
>
> From: Jean-Daniel Cryans <jdcry...@apache.org>
> To: hbase-user@hadoop.apache.org, cryp...@mail.saturnfans.com
> Subject: Re: set number of map tasks for HBase MR
> Date: Sun, 11 Apr 2010 09:14:35 +0000
>
> A map against a HBase table by default cannot have more tasks than the
> number of regions in that table.
>
> Also you want to enable scanner caching. Pass a Scan object to the
> TableMapReduceUtil.initTableMapperJob that is configured with
> scan.setCaching(some_value) where the value should be the number of
> rows to fetch every time we hit a region server with next(). On rows
> of 100-200 bytes, our jobs usually are configured with 1000 up to
> 10000.
>
> Finally, is your job running in local mode or on a job tracker? Even
> if HBase uses HDFS, it usually doesn't know of the job tracker unless
> you configure HBase's classpath with Hadoop's conf.
>
> J-D
>
> On Sun, Apr 11, 2010 at 3:17 AM, Andriy Kolyadenko
> <cryp...@mail.saturnfans.com> wrote:
>> Hi,
>>
>> thanks for quick response. I tried to do following in the code:
>>
>> job.getConfiguration().setInt("mapred.map.tasks", 10000);
>>
>> but unfortunately have the same result.
>>
>> Any other ideas?
>>
>> --- ama...@gmail.com wrote:
>>
>> From: Amandeep Khurana <ama...@gmail.com>
>> To: hbase-user@hadoop.apache.org, cryp...@mail.saturnfans.com
>> Subject: Re: set number of map tasks for HBase MR
>> Date: Sat, 10 Apr 2010 20:04:18 -0700
>>
>> You can set the number of map tasks in your job config to a big number (eg:
>> 100000), and the library will automatically spawn one map task per region.
>>
>> -ak
>>
>>
>> Amandeep Khurana
>> Computer Science Graduate Student
>> University of California, Santa Cruz
>>
>>
>> On Sat, Apr 10, 2010 at 7:59 PM, Andriy Kolyadenko <
>> cryp...@mail.saturnfans.com> wrote:
>>
>>> Hi guys,
>>>
>>> I have about 8G Hbase table  and I want to run MR job against it. It works
>>> extremely slow in my case. One thing I noticed is that job runs only 2 map
>>> tasks. Is it any way to setup bigger number of map tasks? I sow some method
>>> in mapred package, but can't find anything like this in new mapreduce
>>> package.
>>>
>>> I run my MR job one a single machine in cluster mode.
>>>
>>>
>>> _____________________________________________________________
>>> Sign up for your free SaturnFans email account at
>>> http://webmail.saturnfans.com/
>>>
>>
>>
>>
>>
>> _____________________________________________________________
>> Sign up for your free SaturnFans email account at 
>> http://webmail.saturnfans.com/
>>
>
>
>
>
> _____________________________________________________________
> Sign up for your free SaturnFans email account at 
> http://webmail.saturnfans.com/

Reply via email to