Hi Cedric, Can you share your version of getSplits to feed only a subset of records to me? I expect your method can select the subset based on row keys as well as some column values. Thank you.
Cedric Ho wrote: > > Thanks for the solutions, I've tried overriding getSplits and it does > what I need. > > But for the RowFilter, I guess it would also need to scan through all > records and do filtering. So wouldn't it be the same if I do the > filtering myself during the map phrase? > > Cedric > > > On Thu, Oct 9, 2008 at 5:13 AM, stack <[EMAIL PROTECTED]> wrote: >> Cedric Ho wrote: >>> >>> Hi all, >>> >>> I am using 0.18.0 and have successfully used data from hbase table as >>> input to my map/reduce job. >>> >>> I wonder how to specify a subset of records from a table instead of >>> taking all records as input. >>> Such as a range of the row keys or maybe by specific values of certain >>> columns. >>> >> >> You'll have to subclass the TableInputFormat. >> >> There is an example in the javadoc on subclassing TIF: >> http://hadoop.apache.org/hbase/docs/r0.18.0/api/org/apache/hadoop/hbase/mapred/TableInputFormatBase.html >> (Sorry, the example is mangled. Do a get of the html source to see >> non-garbled code). >> >> The example shows you how to set a filter. Filters can filter on rows >> and >> values. >> >> To work against a subset, you'd probably need to play with getSplits in >> your subclass. Default, it basically eretrns as many splits as there >> are >> regions in your table, so its the whole table always. Filters could stop >> unwanted rows being returned but maybe its better if the rows weren't >> considered in the first place; hence the need of getSplits subclassing. >> >> St.Ack >> >> > > -- View this message in context: http://www.nabble.com/map-reduce-range-of-records-from-hbase-table-tp19873787p20948685.html Sent from the HBase User mailing list archive at Nabble.com.
