I am a newbie, but... I think it will boil down to something looking at the column and applying the filter. I don't think without reworking the model or adding some kind of index you would get around this.
Why not set a RowFilter to the TableInputFormat and then it is filtered before your map - I presume this would be more efficient than shuffling all the data through the task tracking of Hadoop MR. Cheers Tim On Tue, Apr 7, 2009 at 11:26 AM, Rakhi Khatwani <[email protected]> wrote: > Hi, > i have a map reduce program with which i read from a hbase table. > In my map program i check if the column value of a is xxx, if yes then > continue with processing else skip it. > however if my table is really big, most of my time in the map gets wasted > for processing unwanted rows. > is there any way through which we could send a subset of rows (based on the > value of a particular column family) to the map??? > > i have also gone through TableInputFormatBase but am not able to figure out > how do we set the input format if we are using TableMapReduceUtil class to > initialize table map jobs. or is there any other way i could use it. > > Thanks in Advance, > Raakhi. >
