[ 
https://issues.apache.org/jira/browse/HBASE-3477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell resolved HBASE-3477.
-----------------------------------

    Resolution: Cannot Reproduce

Reopen or file new issue if relevant with modern HBase versions

> Filter for deprecated mapred APIs doesn't work when the table has few rows
> --------------------------------------------------------------------------
>
>                 Key: HBASE-3477
>                 URL: https://issues.apache.org/jira/browse/HBASE-3477
>             Project: HBase
>          Issue Type: Bug
>          Components: Filters
>    Affects Versions: 0.90.0
>         Environment: Linux (Debian), master 1, slaves 2
>            Reporter: Yifeng Jiang
>
> It seems that the filters will not be invoke when there are only a few data 
> in the table.
> I added some logs to the org.apache.hadoop.hbase.filte. PrefixFilter, and has 
> a MyInputFormat extends hbase.mapred.TableInputFormat, the deprecated mapred 
> APIs.
> The log added to PrefixFilter
> {noformat} 
>   public boolean filterRowKey(byte[] buffer, int offset, int length) {
>     log.info("TODO: filterRowKey invoked");
>     if (buffer == null || this.prefix == null) {
>         log.info("TODO: #1 of filter");
>       return true;
>     }
>     if (length < prefix.length) {
>    ...
>   }
> {noformat} 
> This is the code in my InputFormat's configure method.
> {noformat} 
> byte[] prefix = Bytes.toBytes("001");
> Filter filter = new PrefixFilter(prefix);
> setRowFilter(filter);
> {noformat} 
> And the job setup code.
> {noformat} 
> job.setInputFormat(MyInputFormat.class);
> FileInputFormat.addInputPaths(job, "my_table_in_hbase");
> job.set(TableInputFormat.COLUMN_LIST, "data:");
> {noformat} 
> When I put lots of data (> 500,000) in the table, the filter works well, but 
> when I put only a few data (<100) in the table, it seems that the filter will 
> not be invoked,  and the log in the filter has no output either.
> This is the log output when lots of data in the table
> {noformat} 
> 2011-01-25 16:43:59,568 INFO org.apache.hadoop.hbase.filter.PrefixFilter: 
> TODO: default constructor
> 2011-01-25 16:44:01,728 INFO org.apache.hadoop.hbase.filter.PrefixFilter: 
> TODO: filterRowKey invoked
> 2011-01-25 16:44:01,728 INFO org.apache.hadoop.hbase.filter.PrefixFilter: 
> TODO: #3 of filter
> 2011-01-25 16:44:01,728 INFO org.apache.hadoop.hbase.filter.PrefixFilter: 
> TODO: filterAllRemaining invoked
> 2011-01-25 16:44:01,729 INFO org.apache.hadoop.hbase.filter.PrefixFilter: 
> TODO: filterAllRemaining invoked
> 2011-01-25 16:44:01,729 INFO org.apache.hadoop.hbase.filter.PrefixFilter: 
> TODO: filterAllRemaining invoked
> {noformat} 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to