WRT your higher number of mappers, you probably did something else that corrected it (for example, you restarted mapreduce and it took into account a change you made to the configurations). The scanner caching is totally unrelated to mapreduce.
Glad to hear it's much faster. J-D On Sun, Apr 11, 2010 at 11:21 PM, Andriy Kolyadenko <cryp...@mail.saturnfans.com> wrote: > My table is about 8g, and according to logs has about 70 regions. > > I don't understand the logic, but scan.setCaching(1000) increased the number > of map tasks from 2 to 65 and increased performance significantly, thanks for > hint! > > > > --- jdcry...@apache.org wrote: > > From: Jean-Daniel Cryans <jdcry...@apache.org> > To: hbase-user@hadoop.apache.org, cryp...@mail.saturnfans.com > Subject: Re: set number of map tasks for HBase MR > Date: Sun, 11 Apr 2010 09:14:35 +0000 > > A map against a HBase table by default cannot have more tasks than the > number of regions in that table. > > Also you want to enable scanner caching. Pass a Scan object to the > TableMapReduceUtil.initTableMapperJob that is configured with > scan.setCaching(some_value) where the value should be the number of > rows to fetch every time we hit a region server with next(). On rows > of 100-200 bytes, our jobs usually are configured with 1000 up to > 10000. > > Finally, is your job running in local mode or on a job tracker? Even > if HBase uses HDFS, it usually doesn't know of the job tracker unless > you configure HBase's classpath with Hadoop's conf. > > J-D > > On Sun, Apr 11, 2010 at 3:17 AM, Andriy Kolyadenko > <cryp...@mail.saturnfans.com> wrote: >> Hi, >> >> thanks for quick response. I tried to do following in the code: >> >> job.getConfiguration().setInt("mapred.map.tasks", 10000); >> >> but unfortunately have the same result. >> >> Any other ideas? >> >> --- ama...@gmail.com wrote: >> >> From: Amandeep Khurana <ama...@gmail.com> >> To: hbase-user@hadoop.apache.org, cryp...@mail.saturnfans.com >> Subject: Re: set number of map tasks for HBase MR >> Date: Sat, 10 Apr 2010 20:04:18 -0700 >> >> You can set the number of map tasks in your job config to a big number (eg: >> 100000), and the library will automatically spawn one map task per region. >> >> -ak >> >> >> Amandeep Khurana >> Computer Science Graduate Student >> University of California, Santa Cruz >> >> >> On Sat, Apr 10, 2010 at 7:59 PM, Andriy Kolyadenko < >> cryp...@mail.saturnfans.com> wrote: >> >>> Hi guys, >>> >>> I have about 8G Hbase table and I want to run MR job against it. It works >>> extremely slow in my case. One thing I noticed is that job runs only 2 map >>> tasks. Is it any way to setup bigger number of map tasks? I sow some method >>> in mapred package, but can't find anything like this in new mapreduce >>> package. >>> >>> I run my MR job one a single machine in cluster mode. >>> >>> >>> _____________________________________________________________ >>> Sign up for your free SaturnFans email account at >>> http://webmail.saturnfans.com/ >>> >> >> >> >> >> _____________________________________________________________ >> Sign up for your free SaturnFans email account at >> http://webmail.saturnfans.com/ >> > > > > > _____________________________________________________________ > Sign up for your free SaturnFans email account at > http://webmail.saturnfans.com/