[
https://issues.apache.org/jira/browse/HBASE-9747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13793144#comment-13793144
]
Aditya Kishore commented on HBASE-9747:
---------------------------------------
Actually, I am surprised by the third scan result
{quote}
hbase(main):002:0> scan 't1',
\{FILTER => "SingleColumnValueFilter('f1', 'q1', =, 'binary:113')"}
ROW COLUMN+CELL
c1 column=f1:q1, timestamp=1381469178679, value=113
1 row(s) in 0.0140 seconds
{quote}
This should have returned two rows
{noformat}
a1 column=f1:q2, timestamp=1381468905492, value=111
c1 column=f1:q1, timestamp=1381468905549, value=113
{noformat}
The {{SingleColumnValueFilter}}, by default does not filter out the rows in
which the specified column does not exist ('a1', in your case). So it will let
this row returned for the scan. For the same scan I get this result.
{noformat}
hbase(main):010:0> scan 't1', {FILTER => "SingleColumnValueFilter('f1', 'q1',
=, 'binary:113')"}
ROW COLUMN+CELL
a1 column=f1:q2,
timestamp=1381528316466, value=111
c1 column=f1:q1,
timestamp=1381528324693, value=113
2 row(s) in 0.0210 seconds
{noformat}
If you want to drop the rows which does not include the column, you need to
call {{SingleColumnValueFilter.setFilterIfMissing(true)}}, from the shell you
can invoke it this way.
{noformat}
scan 't1', {FILTER => "SingleColumnValueFilter('f1', 'q1', =, 'binary:113',
true, false)"}
{noformat}
> PrefixFilter with OR condition gives wrong results
> --------------------------------------------------
>
> Key: HBASE-9747
> URL: https://issues.apache.org/jira/browse/HBASE-9747
> Project: HBase
> Issue Type: Bug
> Components: Filters
> Affects Versions: 0.94.9
> Reporter: Deepa Remesh
>
> PrefixFilter when used with a SingleColumnValueFilter with an OR condition
> gives wrong results. In below example, each filter when evaluated separately
> gives 1 row each. The OR condition with the two filters gives 3 rows instead
> of 2. Repro below:
> create 't1', 'f1'
> put 't1','a1','f1:q2','111'
> put 't1','b1','f1:q1','112'
> put 't1','c1','f1:q1','113'
> hbase(main):020:0> scan 't1', {FILTER => "PrefixFilter ('b') OR
> SingleColumnValueFilter('f1', 'q1', =, 'binary:113')"}
> ROW COLUMN+CELL
> a1 column=f1:q2,
> timestamp=1381468905492, value=111
> b1 column=f1:q1,
> timestamp=1381468905518, value=112
> c1 column=f1:q1,
> timestamp=1381468905549, value=113
> 3 row(s) in 0.1020 seconds
> hbase(main):021:0> scan 't1', {FILTER => "PrefixFilter ('b')"}
> ROW COLUMN+CELL
> b1 column=f1:q1,
> timestamp=1381468905518, value=112
> 1 row(s) in 0.0150 seconds
> hbase(main):002:0> scan 't1', {FILTER => "SingleColumnValueFilter('f1', 'q1',
> =, 'binary:113')"}
> ROW COLUMN+CELL
> c1 column=f1:q1,
> timestamp=1381469178679, value=113
> 1 row(s) in 0.0140 seconds
--
This message was sent by Atlassian JIRA
(v6.1#6144)