Hi,
I did try the filter... but using ColumnValueFilter. i declared a
ColumnValueFilter as follows:
public class TableInputFilter extends TableInputFormat
implements JobConfigurable {
public void configure(final JobConf jobConf) {
setHtable(tablename);
setInputColumns(columnName);
final RowFilterInterface colFilter =
new
ColumnValueFilter("Status:".getBytes(), ColumnValueFilter.CompareOp.EQUAL,
"UNCOLLECTED".getBytes());
setRowFilter(colFilter);
}
}
and thn i use my class as the input format to my map function.
in my map function, i set my log to display the value of my Status Column
family.
when i execute my map reduce function, it displays "Status:: Uncollected"
for some rows
and Status = "Collected" for rest of the rows.
but what i want is to send only those records whose 'Status: is
uncollected'.
i even considered using the method filterRow described by the API as
follows:
boolean
*filterRow<http://hadoop.apache.org/hbase/docs/current/api/org/apache/hadoop/hbase/filter/ColumnValueFilter.html#filterRow%28java.util.SortedMap%29>
*(SortedMap<http://java.sun.com/javase/6/docs/api/java/util/SortedMap.html?is-external=true>
<byte[],Cell<http://hadoop.apache.org/hbase/docs/current/api/org/apache/hadoop/hbase/io/Cell.html>
> columns)
Filter on the fully assembled row.
but as soon as i type colFilter followed by a '.', my eclipse hangs.
its really weird... i have tried it on 3 different machines (2 machines on
linux running eclipse gannymade 3.4 and one on windows using myEclipse).
i dunno if i am going wrong somewhere
Thanks,
Raakhi
On Tue, Apr 7, 2009 at 7:18 PM, Lars George <[email protected]> wrote:
> Hi Rakhi,
>
> The way the filters work is that you either use the supplied filters or
> create your own subclasses - but then you will have to deploy that class to
> all RegionServers while adding it to their respective hbase-env.sh (in the
> "export HBASE_CLASSPATH" variable). We are discussing currently if this
> could be done dynamically (
> https://issues.apache.org/jira/browse/HBASE-1288).
>
> Once you have that done or use one of the supplied one then you can assign
> the filter by overriding the TableInputFormat's configure() method and
> assign it like so:
>
> public void configure(JobConf job) {
> RegExpRowFilter filter = new RegExpRowFilter("ABC.*");
> setRowFilter(filter);
> }
>
> As Tim points out, setting the whole thing up is done in your main M/R tool
> based application, similar to:
>
> JobConf job = new JobConf(...);
> TableMapReduceUtil.initTableMapJob("<table-name>", "<colums>",
> IdentityTableMap.class,
> ImmutableBytesWritable.class, RowResult.class, job);
> job.setReducerClass(MyTableReduce.class);
> job.setInputFormat(MyTableInputFormat.class);
> job.setOutputFormat(MyTableOutputFormat.class);
>
> Of course depending on what classes you want to replace or if this is a
> Reduce oriented job (means a default identity + filter map and all the work
> done in the Reduce phase) or the other way around. But the principles and
> filtering are the same.
>
> HTH,
> Lars
>
>
>
> Rakhi Khatwani wrote:
>
>> Thanks Ryan, i will try that
>>
>> On Tue, Apr 7, 2009 at 3:05 PM, Ryan Rawson <[email protected]> wrote:
>>
>>
>>
>>> there is a server-side mechanism to filter rows, it's found in the
>>> org.apache.hadoop.hbase.filter package. im not sure how this interops
>>> with
>>> the TableInputFormat exactly.
>>>
>>> setting a filter to reduce the # of rows returned is pretty much exactly
>>> what you want.
>>>
>>> On Tue, Apr 7, 2009 at 2:26 AM, Rakhi Khatwani <[email protected]
>>>
>>>
>>>> wrote:
>>>> Hi,
>>>> i have a map reduce program with which i read from a hbase table.
>>>> In my map program i check if the column value of a is xxx, if yes then
>>>> continue with processing else skip it.
>>>> however if my table is really big, most of my time in the map gets
>>>> wasted
>>>> for processing unwanted rows.
>>>> is there any way through which we could send a subset of rows (based on
>>>>
>>>>
>>> the
>>>
>>>
>>>> value of a particular column family) to the map???
>>>>
>>>> i have also gone through TableInputFormatBase but am not able to figure
>>>>
>>>>
>>> out
>>>
>>>
>>>> how do we set the input format if we are using TableMapReduceUtil class
>>>>
>>>>
>>> to
>>>
>>>
>>>> initialize table map jobs. or is there any other way i could use it.
>>>>
>>>> Thanks in Advance,
>>>> Raakhi.
>>>>
>>>>
>>>>
>>>
>>
>>
>