Re: Custom Input Split

Stack Wed, 22 Apr 2009 09:07:22 -0700

If you run

./bin/hadoop -jar hbase.jar rowcounter

It will emit usage. You are a smart fellow. I think you can take itfrom there.


Stack

On Apr 22, 2009, at 5:48, Rakhi Khatwani <[email protected]>wrote:

Hi Lars,
Thanks for the suggesstion, I also figured out my problemusing
TableInputFormatBase.
but my table had only one region but i still wanted to split theinput into
4 maps.
so i am basically overriding the getInputSplits() method in
TableInputFormatBase.

One more question
is there any method in hbase API which can count the number of rowsin a
table?
i tried googling it and all i came across is a RowCounter classwhich is amapreduce job to count the number of rows. but i really dont knowhow to use
it. any suggestions?

thanks,
Raakhi
On Wed, Apr 22, 2009 at 4:30 AM, Lars George <[email protected]>wrote:
Hi Rakhi,
This is all done in the TableInputFormatBase class, which you canextend
and then override the getSplits() function:


http://hadoop.apache.org/hbase/docs/r0.19.1/api/org/apache/hadoop/hbase/mapred/TableInputFormatBase.html
This is where you can then specify how many rows per map areassigned.Really straight forward as I see it. I have used it to implement aspecial"only use N regions" support where I can run a sample subsetagainst a MR
job. For example only map 5 out if 8K regions of a table.

The default one will always split all regions into N maps. Hence the
recommendation to set the number of maps to the number of regionsin atable. If you set it to something lower than it will split theregions intoa smaller number but with more rows per map, i.e. each map getsmore than
one region to process.
Look into the source of the above class and it should be obvious -I hope.
Lars



Rakhi Khatwani wrote:
Hi,
   I have a table with N records,
   now i want to run a map reduce job with 4 maps and 0 reduces.
is there a way i can create my own custom input split so that ican
send 'n' records to each map??
  if there is a way, can i have a sample code snippet to gain better
understanding?

Thanks
Raakhi.

Re: Custom Input Split

Reply via email to