Re: map reduce range of records from hbase table

Cedric Ho Wed, 08 Oct 2008 22:51:02 -0700

Thanks for the solutions, I've tried overriding getSplits and it does
what I need.


But for the RowFilter, I guess it would also need to scan through all
records and do filtering. So wouldn't it be the same if I do the
filtering myself during the map phrase?

Cedric


On Thu, Oct 9, 2008 at 5:13 AM, stack <[EMAIL PROTECTED]> wrote:
> Cedric Ho wrote:
>>
>> Hi all,
>>
>> I am using 0.18.0 and have successfully used data from hbase table as
>> input to my map/reduce job.
>>
>> I wonder how to specify a subset of records from a table instead of
>> taking all records as input.
>> Such as a range of the row keys or maybe by specific values of certain
>> columns.
>>
>
> You'll have to subclass the TableInputFormat.
>
> There is an example in the javadoc on subclassing TIF:
> http://hadoop.apache.org/hbase/docs/r0.18.0/api/org/apache/hadoop/hbase/mapred/TableInputFormatBase.html
> (Sorry, the example is mangled.  Do a get of the html source to see
> non-garbled code).
>
> The example shows you how to set a filter.  Filters can filter on rows and
> values.
>
> To work against a subset, you'd probably need to play with getSplits  in
> your subclass.   Default, it  basically eretrns as many splits as there are
> regions in your table, so its the whole table always.  Filters could stop
> unwanted rows being returned but maybe its better if the rows weren't
> considered in the first place; hence the need of getSplits subclassing.
>
> St.Ack
>
>

Re: map reduce range of records from hbase table

Reply via email to