[
https://issues.apache.org/jira/browse/HBASE-2198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12831990#action_12831990
]
Ferdy commented on HBASE-2198:
------------------------------
Created a new issue for the new Filter:
https://issues.apache.org/jira/browse/HBASE-2211
> SingleColumnValueFilter should be able to find the column value even when
> it's not specifically added as input on the scan.
> ---------------------------------------------------------------------------------------------------------------------------
>
> Key: HBASE-2198
> URL: https://issues.apache.org/jira/browse/HBASE-2198
> Project: Hadoop HBase
> Issue Type: Improvement
> Components: filters
> Affects Versions: 0.20.3
> Reporter: Ferdy
> Fix For: 0.20.4, 0.21.0
>
> Attachments: HBASE-2198.patch
>
>
> Whenever applying a SingleColumnValueFilter to a Scan that has specific
> columns as it's input (but not the column to be checked in the Filter), the
> Filter won't be able to find the value that it should be checking.
> For example, let's say we want to do a scan, but we only need COLUMN_2
> columns. Furthermore, we only want rows that have a specific value for
> COLUMN_1. Using the following code won't do the trick:
> Scan scan = new Scan();
> scan.addColumn(FAMILY, COLUMN_2);
> SingleColumnValueFilter filter = new SingleColumnValueFilter(FAMILY,
> COLUMN_1, CompareOp.EQUAL, TEST_VALUE);
> filter.setFilterIfMissing(true);
> scan.setFilter(filter);
> However, we can make it work when specifically also adding the tested column
> as an input column:
> scan.addColumn(FAMILY, COLUMN_1);
> Is this by design? Personally I think that adding a filter with columns tests
> should not bother the user to check that it's also on the input. It is prone
> to bugs.
> I suggest either one of 3 solutions:
> A) Update the Javadoc of Filter / SingleColumnValueFilter / possibly other
> affecting Filters to indicate this behaviour.
> B) Fix the problem client-side (i.e. prior to using a Scan object, it should
> check that the corresponding inputs for filters are set, but only if the user
> has configured specific input columns in the first place). This is perhaps
> inefficient performance-wise, because unnecessary inputs columns are returned
> to the user. (Inputs that would only have to be used for filtering).
> C) Fix the problem server-side. This would me most efficient, because the
> input column would only be read to do filtering at the regionserver.
> What do you think?
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.