Re: Custom Input Split

Lars George Wed, 22 Apr 2009 05:31:29 -0700

Hi Rakhi,

This is all done in the TableInputFormatBase class, which you can extendand then override the getSplits() function:


http://hadoop.apache.org/hbase/docs/r0.19.1/api/org/apache/hadoop/hbase/mapred/TableInputFormatBase.html

This is where you can then specify how many rows per map are assigned.Really straight forward as I see it. I have used it to implement aspecial "only use N regions" support where I can run a sample subsetagainst a MR job. For example only map 5 out if 8K regions of a table.

The default one will always split all regions into N maps. Hence therecommendation to set the number of maps to the number of regions in atable. If you set it to something lower than it will split the regionsinto a smaller number but with more rows per map, i.e. each map getsmore than one region to process.


Look into the source of the above class and it should be obvious - I hope.

Lars


Rakhi Khatwani wrote:

Hi,
     I have a table with N records,
     now i want to run a map reduce job with 4 maps and 0 reduces.
     is there a way i can create my own custom input split so that i can
send 'n' records to each map??
    if there is a way, can i have a sample code snippet to gain better
understanding?

Thanks
Raakhi.

Re: Custom Input Split

Reply via email to