[ 
https://issues.apache.org/jira/browse/HBASE-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15672429#comment-15672429
 ] 

Duo Zhang commented on HBASE-3562:
----------------------------------

Do you mean we should commit the UTs in this patch?

Now in master, we will call columns.checkColumn before evaluating filter so I 
think the problem described here is gone. But in general, I think we should 
also count versions before evaluating filters. The current 
implementation(filter then count versions) may returns different results on the 
same data set due to major compaction.

Think of this. You set maxVersions to 3, and there are 4 versions. Your filter 
will filter out the 3 newer versions, so you will get the oldest version when 
doing a get or scan. And here comes a major compaction, the oldest version is 
reclaimed. At this time you will get nothing when doing the same get or scan.

We need to fix this I think although this is an 'incompatible change'.

Thanks.

> ValueFilter is being evaluated before performing the column match
> -----------------------------------------------------------------
>
>                 Key: HBASE-3562
>                 URL: https://issues.apache.org/jira/browse/HBASE-3562
>             Project: HBase
>          Issue Type: Bug
>          Components: Filters
>    Affects Versions: 0.90.0, 0.94.7
>            Reporter: Evert Arckens
>         Attachments: HBASE-3562.patch
>
>
> When performing a Get operation where a both a column is specified and a 
> ValueFilter, the ValueFilter is evaluated before making the column match as 
> is indicated in the javadoc of Get.setFilter()  : " {@link 
> Filter#filterKeyValue(KeyValue)} is called AFTER all tests for ttl, column 
> match, deletes and max versions have been run. "
> The is shown in the little test below, which uses a TestComparator extending 
> a WritableByteArrayComparable.
> public void testFilter() throws Exception {
>       byte[] cf = Bytes.toBytes("cf");
>       byte[] row = Bytes.toBytes("row");
>       byte[] col1 = Bytes.toBytes("col1");
>       byte[] col2 = Bytes.toBytes("col2");
>       Put put = new Put(row);
>       put.add(cf, col1, new byte[]{(byte)1});
>       put.add(cf, col2, new byte[]{(byte)2});
>       table.put(put);
>       Get get = new Get(row);
>       get.addColumn(cf, col2); // We only want to retrieve col2
>       TestComparator testComparator = new TestComparator();
>       Filter filter = new ValueFilter(CompareOp.EQUAL, testComparator);
>       get.setFilter(filter);
>       Result result = table.get(get);
> }
> public class TestComparator extends WritableByteArrayComparable {
>     /**
>      * Nullary constructor, for Writable
>      */
>     public TestComparator() {
>         super();
>     }
>     
>     @Override
>     public int compareTo(byte[] theirValue) {
>         if (theirValue[0] == (byte)1) {
>             // If the column match was done before evaluating the filter, we 
> should never get here.
>             throw new RuntimeException("I only expect (byte)2 in col2, not 
> (byte)1 from col1");
>         }
>         if (theirValue[0] == (byte)2) {
>             return 0;
>         }
>         else return 1;
>     }
> }
> When only one column should be retrieved, this can be worked around by using 
> a SingleColumnValueFilter instead of the ValueFilter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to