Hi Rakhi,

This is all done in the TableInputFormatBase class, which you can extend and then override the getSplits() function:

http://hadoop.apache.org/hbase/docs/r0.19.1/api/org/apache/hadoop/hbase/mapred/TableInputFormatBase.html

This is where you can then specify how many rows per map are assigned. Really straight forward as I see it. I have used it to implement a special "only use N regions" support where I can run a sample subset against a MR job. For example only map 5 out if 8K regions of a table.

The default one will always split all regions into N maps. Hence the recommendation to set the number of maps to the number of regions in a table. If you set it to something lower than it will split the regions into a smaller number but with more rows per map, i.e. each map gets more than one region to process.

Look into the source of the above class and it should be obvious - I hope.

Lars


Rakhi Khatwani wrote:
Hi,
     I have a table with N records,
     now i want to run a map reduce job with 4 maps and 0 reduces.
     is there a way i can create my own custom input split so that i can
send 'n' records to each map??
    if there is a way, can i have a sample code snippet to gain better
understanding?

Thanks
Raakhi.

Reply via email to