Cedric Ho wrote:
Hi all,

I am using 0.18.0 and have successfully used data from hbase table as
input to my map/reduce job.

I wonder how to specify a subset of records from a table instead of
taking all records as input.
Such as a range of the row keys or maybe by specific values of certain columns.
You'll have to subclass the TableInputFormat.

There is an example in the javadoc on subclassing TIF: http://hadoop.apache.org/hbase/docs/r0.18.0/api/org/apache/hadoop/hbase/mapred/TableInputFormatBase.html (Sorry, the example is mangled. Do a get of the html source to see non-garbled code).

The example shows you how to set a filter. Filters can filter on rows and values.

To work against a subset, you'd probably need to play with getSplits in your subclass. Default, it basically eretrns as many splits as there are regions in your table, so its the whole table always. Filters could stop unwanted rows being returned but maybe its better if the rows weren't considered in the first place; hence the need of getSplits subclassing.

St.Ack

Reply via email to