Re: set number of map tasks for HBase MR

Ted Yu Sun, 11 Apr 2010 07:56:54 -0700

https://issues.apache.org/jira/browse/HBASE-2434 has been logged.


On Sun, Apr 11, 2010 at 7:09 AM, Jean-Daniel Cryans <jdcry...@apache.org>wrote:

> Yes an option could be added, along with a write buffer option for Import.
>
> J-D
>
> On Sun, Apr 11, 2010 at 3:30 PM, Ted Yu <yuzhih...@gmail.com> wrote:
> > I noticed mapreduce.Export.createSubmittableJob() doesn't call
> setCaching()
> > in 0.20.3
> >
> > Should call to setCaching() be added ?
> >
> > Thanks
> >
> > On Sun, Apr 11, 2010 at 2:14 AM, Jean-Daniel Cryans <jdcry...@apache.org
> >wrote:
> >
> >> A map against a HBase table by default cannot have more tasks than the
> >> number of regions in that table.
> >>
> >> Also you want to enable scanner caching. Pass a Scan object to the
> >> TableMapReduceUtil.initTableMapperJob that is configured with
> >> scan.setCaching(some_value) where the value should be the number of
> >> rows to fetch every time we hit a region server with next(). On rows
> >> of 100-200 bytes, our jobs usually are configured with 1000 up to
> >> 10000.
> >>
> >> Finally, is your job running in local mode or on a job tracker? Even
> >> if HBase uses HDFS, it usually doesn't know of the job tracker unless
> >> you configure HBase's classpath with Hadoop's conf.
> >>
> >> J-D
> >>
> >> On Sun, Apr 11, 2010 at 3:17 AM, Andriy Kolyadenko
> >> <cryp...@mail.saturnfans.com> wrote:
> >> > Hi,
> >> >
> >> > thanks for quick response. I tried to do following in the code:
> >> >
> >> > job.getConfiguration().setInt("mapred.map.tasks", 10000);
> >> >
> >> > but unfortunately have the same result.
> >> >
> >> > Any other ideas?
> >> >
> >> > --- ama...@gmail.com wrote:
> >> >
> >> > From: Amandeep Khurana <ama...@gmail.com>
> >> > To: hbase-user@hadoop.apache.org, cryp...@mail.saturnfans.com
> >> > Subject: Re: set number of map tasks for HBase MR
> >> > Date: Sat, 10 Apr 2010 20:04:18 -0700
> >> >
> >> > You can set the number of map tasks in your job config to a big number
> >> (eg:
> >> > 100000), and the library will automatically spawn one map task per
> >> region.
> >> >
> >> > -ak
> >> >
> >> >
> >> > Amandeep Khurana
> >> > Computer Science Graduate Student
> >> > University of California, Santa Cruz
> >> >
> >> >
> >> > On Sat, Apr 10, 2010 at 7:59 PM, Andriy Kolyadenko <
> >> > cryp...@mail.saturnfans.com> wrote:
> >> >
> >> >> Hi guys,
> >> >>
> >> >> I have about 8G Hbase table  and I want to run MR job against it. It
> >> works
> >> >> extremely slow in my case. One thing I noticed is that job runs only
> 2
> >> map
> >> >> tasks. Is it any way to setup bigger number of map tasks? I sow some
> >> method
> >> >> in mapred package, but can't find anything like this in new mapreduce
> >> >> package.
> >> >>
> >> >> I run my MR job one a single machine in cluster mode.
> >> >>
> >> >>
> >> >> _____________________________________________________________
> >> >> Sign up for your free SaturnFans email account at
> >> >> http://webmail.saturnfans.com/
> >> >>
> >> >
> >> >
> >> >
> >> >
> >> > _____________________________________________________________
> >> > Sign up for your free SaturnFans email account at
> >> http://webmail.saturnfans.com/
> >> >
> >>
> >
>

Re: set number of map tasks for HBase MR

Reply via email to