Re: map reduce range of records from hbase table

tigertail Wed, 10 Dec 2008 19:06:53 -0800

Hi Cedric,

Can you share your version of getSplits to feed only a subset of records to
me? I expect your method can select the subset based on row keys as well as
some column values. Thank you.



Cedric Ho wrote:
> 
> Thanks for the solutions, I've tried overriding getSplits and it does
> what I need.
> 
> But for the RowFilter, I guess it would also need to scan through all
> records and do filtering. So wouldn't it be the same if I do the
> filtering myself during the map phrase?
> 
> Cedric
> 
> 
> On Thu, Oct 9, 2008 at 5:13 AM, stack <[EMAIL PROTECTED]> wrote:
>> Cedric Ho wrote:
>>>
>>> Hi all,
>>>
>>> I am using 0.18.0 and have successfully used data from hbase table as
>>> input to my map/reduce job.
>>>
>>> I wonder how to specify a subset of records from a table instead of
>>> taking all records as input.
>>> Such as a range of the row keys or maybe by specific values of certain
>>> columns.
>>>
>>
>> You'll have to subclass the TableInputFormat.
>>
>> There is an example in the javadoc on subclassing TIF:
>> http://hadoop.apache.org/hbase/docs/r0.18.0/api/org/apache/hadoop/hbase/mapred/TableInputFormatBase.html
>> (Sorry, the example is mangled.  Do a get of the html source to see
>> non-garbled code).
>>
>> The example shows you how to set a filter.  Filters can filter on rows
>> and
>> values.
>>
>> To work against a subset, you'd probably need to play with getSplits  in
>> your subclass.   Default, it  basically eretrns as many splits as there
>> are
>> regions in your table, so its the whole table always.  Filters could stop
>> unwanted rows being returned but maybe its better if the rows weren't
>> considered in the first place; hence the need of getSplits subclassing.
>>
>> St.Ack
>>
>>
> 
> 

-- 
View this message in context: 
http://www.nabble.com/map-reduce-range-of-records-from-hbase-table-tp19873787p20948685.html
Sent from the HBase User mailing list archive at Nabble.com.

Re: map reduce range of records from hbase table

Reply via email to