I am a newbie, but...

I think it will boil down to something looking at the column and
applying the filter.  I don't think without reworking the model or
adding some kind of index you would get around this.

Why not set a RowFilter to the TableInputFormat and then it is
filtered before your map - I presume this would be more efficient than
shuffling all the data through the task tracking of Hadoop MR.

Cheers

Tim



On Tue, Apr 7, 2009 at 11:26 AM, Rakhi Khatwani
<[email protected]> wrote:
> Hi,
>     i have a map reduce program with which i read from a hbase table.
> In my map program i check if the column value of a is xxx, if yes then
> continue with processing else skip it.
> however if my table is really big, most of my time in the map gets wasted
> for processing unwanted rows.
> is there any way through which we could send a subset of rows (based on the
> value of a particular column family) to the map???
>
> i have also gone through TableInputFormatBase but am not able to figure out
> how do we set the input format if we are using TableMapReduceUtil class to
> initialize table map jobs. or is there any other way i could use it.
>
> Thanks in Advance,
> Raakhi.
>

Reply via email to