Thanks for the solutions, I've tried overriding getSplits and it does what I need.
But for the RowFilter, I guess it would also need to scan through all records and do filtering. So wouldn't it be the same if I do the filtering myself during the map phrase? Cedric On Thu, Oct 9, 2008 at 5:13 AM, stack <[EMAIL PROTECTED]> wrote: > Cedric Ho wrote: >> >> Hi all, >> >> I am using 0.18.0 and have successfully used data from hbase table as >> input to my map/reduce job. >> >> I wonder how to specify a subset of records from a table instead of >> taking all records as input. >> Such as a range of the row keys or maybe by specific values of certain >> columns. >> > > You'll have to subclass the TableInputFormat. > > There is an example in the javadoc on subclassing TIF: > http://hadoop.apache.org/hbase/docs/r0.18.0/api/org/apache/hadoop/hbase/mapred/TableInputFormatBase.html > (Sorry, the example is mangled. Do a get of the html source to see > non-garbled code). > > The example shows you how to set a filter. Filters can filter on rows and > values. > > To work against a subset, you'd probably need to play with getSplits in > your subclass. Default, it basically eretrns as many splits as there are > regions in your table, so its the whole table always. Filters could stop > unwanted rows being returned but maybe its better if the rows weren't > considered in the first place; hence the need of getSplits subclassing. > > St.Ack > >
