SingleColumnValueFilter should be able to find the column value even when it's 
not specifically added as input on the scan.
---------------------------------------------------------------------------------------------------------------------------

                 Key: HBASE-2198
                 URL: https://issues.apache.org/jira/browse/HBASE-2198
             Project: Hadoop HBase
          Issue Type: Improvement
          Components: filters
    Affects Versions: 0.20.3
            Reporter: Ferdy


Whenever applying a SingleColumnValueFilter to a Scan that has specific columns 
as it's input (but not the column to be checked in the Filter), the Filter 
won't be able to find the value that it should be checking.

For example, let's say we want to do a scan, but we only need COLUMN_2 columns. 
Furthermore, we only want rows that have a specific value for COLUMN_1. Using 
the following code won't do the trick:
Scan scan = new Scan();
scan.addColumn(FAMILY, COLUMN_2);
SingleColumnValueFilter filter = new SingleColumnValueFilter(FAMILY, COLUMN_1, 
CompareOp.EQUAL, TEST_VALUE);
filter.setFilterIfMissing(true);
scan.setFilter(filter);

However, we can make it work when specifically also adding the tested column as 
an input column:
scan.addColumn(FAMILY, COLUMN_1);

Is this by design? Personally I think that adding a filter with columns tests 
should not bother the user to check that it's also on the input. It is prone to 
bugs.

I suggest either one of 3 solutions:
A) Update the Javadoc of Filter / SingleColumnValueFilter / possibly other 
affecting Filters to indicate this behaviour.
B) Fix the problem client-side (i.e. prior to using a Scan object, it should 
check that the corresponding inputs for filters are set, but only if the user 
has configured specific input columns in the first place). This is perhaps 
inefficient performance-wise, because unnecessary inputs columns are returned 
to the user. (Inputs that would only have to be used for filtering).
C) Fix the problem server-side. This would me most efficient, because the input 
column would only be read to do filtering at the regionserver.

What do you think?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to