there is a server-side mechanism to filter rows, it's found in the org.apache.hadoop.hbase.filter package. im not sure how this interops with the TableInputFormat exactly.
setting a filter to reduce the # of rows returned is pretty much exactly what you want. On Tue, Apr 7, 2009 at 2:26 AM, Rakhi Khatwani <[email protected]>wrote: > Hi, > i have a map reduce program with which i read from a hbase table. > In my map program i check if the column value of a is xxx, if yes then > continue with processing else skip it. > however if my table is really big, most of my time in the map gets wasted > for processing unwanted rows. > is there any way through which we could send a subset of rows (based on the > value of a particular column family) to the map??? > > i have also gone through TableInputFormatBase but am not able to figure out > how do we set the input format if we are using TableMapReduceUtil class to > initialize table map jobs. or is there any other way i could use it. > > Thanks in Advance, > Raakhi. >
