[ 
https://issues.apache.org/jira/browse/HBASE-20565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16546584#comment-16546584
 ] 

Zheng Hu edited comment on HBASE-20565 at 7/17/18 1:19 PM:
-----------------------------------------------------------

Upload the patch.v1, and  Let me explain the core idea: 

Assume that filterList = filter-A   AND   filter-B  AND filter-C  AND ....,    
if  a cell has been filtered out by filter-A,  then no need to 
pass the cell to filter-B and filter-C,  only the included cell set of filter-A 
should be passed to filter-B, and only the included cell set of filter-A & 
filter-B should be passed to filter-C ....  

The max rule can still working,  but only the include* return code should be 
merged into a max return code. 

The problem is the order of filters may result in diff cells...so we need to 
tell the user explicitly to place the count-related filters at the last 
position.  In SQL syntax,  we accept the sql :
{code}
 select * from table where xxx and yyy  limit 1, 100,
{code}
the limit is at the end of the statement,

SQL such as: 
{code}
select * from table where xxx limit 1, 1000 and yyy
{code}
will not be accepted. 


was (Author: openinx):
Upload the patch.v1, and  pasted the discuss with [~anoop.hbase] ... 

> What if the order of filters be opposite way in FL?
A good question,   I think we need to tell the user explicitly to place the 
count-related filters at the last position.  In SQL syntax,  we accept the sql 
: select * from table where xxx and xxx  limit 1, 100, the limit is at the end 
of the statement,  sql such as: select * from table where xxx limit 1, 1000 and 
xx will not be accepted.  
I think it's meaningful to require the count-related filters put at the end of 
sub-filters. 


On Fri, May 25, 2018 at 6:25 PM, Anoop John <anoop.hb...@gmail.com> wrote:
> if  a cell has been filtered out by filter-A,  then no need to
pass the cell to filter-B and filter-C,  only the included cell set of
filter-A should be passed to filter-B, and only the included cell set
of filter-A & filter-B should be passed to filter-C ...

U mean u propose such a change now?  Then the order of filters matters
right?  Say the count based filter is coming second and the other
(which can filter out some cells) come as 1st, it will work. What if
the order of filters be opposite way in FL?

-Anoop-

On Fri, May 25, 2018 at 12:29 PM, OpenInx <open...@gmail.com> wrote:
> I have to admit that my previous solution was one-sided...
> Not only the ColumnPaginationFilter has the problem, other counter-related
> filters also has the problem too.
>
>> We have 2 filters in a FL. We pass cell 1 and 2. First filter select cell1
>> but been filtered out by F2.  Now we need to tell both filters that we
>> have excludes this cell.  This will be useful for filters which work on
>> counting  basis.  It can reduce the counter which it would have advanced.
>> Pls see the possibility.
>
> Assume that FilterList =  filter-A  AND ColumnCountGetFilter ,  if cell x
> has been filtered out by filter-A,  then what the expected return code do
> the ColumnCountGetFilter#filterKeyValue shoud return ?
> In theory, the count in ColumnCountGetFilter  should not increment when
> checking the cell x .  So what is the purpose of passing the cell  x to
> ColumnCountGetFilter#filterKeyValue ?
> To get the return code from ColumnCountGetFilter for max the forward step ?
>
> Now, I'm thinking that the implementation in branch-1.2  is more reasonable,
> Assume that filterList = filter-A   AND   filter-B  AND filter-C  AND ....,
> if  a cell has been filtered out by filter-A,  then no need to
> pass the cell to filter-B and filter-C,  only the included cell set of
> filter-A should be passed to filter-B, and only the included cell set of
> filter-A & filter-B should be passed to filter-C ....
>
> The max rule can still working,  but only the include* return code should be
> merged into a max return code.
>
> I think the semantic is more reasonable.
>
>
> On Thu, May 24, 2018 at 4:31 PM, Anoop John <anoop.hb...@gmail.com> wrote:
>>
>> The offset is the cell offset in  a row na.  This says we already fetched
>> till there. So ya of there is another filter also along with this pagination
>> filter, it must be hard for the pagination filter to decide the column
>> offset for the next request.  So ya ideally the column offset might work
>> there.
>> But the issue is we can not really generalize this. It depends on the way
>> the col offset and column value offset is been implemented in pagination
>> filter.
>>
>> I kind of thinking that we need a generic framework change now. If we pass
>> all cells to all filters ( which is correct also) then there should be a way
>> later with which we say all filters that we decided later that this cell is
>> not included in result.
>>
>> We have 2 filters in a FL. We pass cell 1 and 2. First filter select cell1
>> but been filtered out by F2.  Now we need to tell both filters that we have
>> excludes this cell.  This will be useful for filters which work on counting
>> basis.  It can reduce the counter which it would have advanced.  Pls see the
>> possibility.
>>
>> I think previously the issue was the order of filters in FL mattered as we
>> wont pass all cells to all filters.  Now that is not an issue. But the later
>> filters possibly filtering out cells still an issue.   WDYT?
>>
>> Anoop
>>
>>
>> On Wednesday, May 23, 2018, OpenInx <open...@gmail.com> wrote:
>>>
>>> > That previously if we have A and
>>> > B in FilterList with AND and if A is not including a cell, we were not
>>> > passing that to B?   (In 1.2  I mean)  and in later versions we start
>>> > passing it?
>>>
>>> Yes,  you can see the code in branch-1.2 [1], and in master branch [2].
>>>
>>> > I think it is a bug ..
>>> Sure, it's a bug.
>>>
>>> > Say if we have another
>>> > filter which might filter out some in between cells, then also we need
>>> > to have 5 cells to be included.
>>> If so ,  the offset is meaningless now, only the limit is  meaningful.
>>> why not just remove the offset in ColumnPaginationFilter.
>>>
>>> In [3], I said, this scene can be completely replaced by
>>> ColumnPaginationFilter(byte[], limit) ...   Does phoenix (or other project )
>>> depend on the offset in ColumnPaginationFilter ?   I'm not sure...
>>>
>>>
>>> 1.
>>> https://github.com/apache/hbase/blob/branch-1.2/hbase-client/src/main/java/org/apache/hadoop/hbase/filter/FilterList.java#L272
>>> 2.
>>> https://github.com/apache/hbase/blob/master/hbase-client/src/main/java/org/apache/hadoop/hbase/filter/FilterListWithAND.java#L171
>>> 3.
>>> https://issues.apache.org/jira/browse/HBASE-20565?focusedCommentId=16480064&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16480064
>>>
>>> On Wed, May 23, 2018 at 12:04 PM, Anoop John <anoop.hb...@gmail.com>
>>> wrote:
>>>>
>>>> So what change made this issue now.  That previously if we have A and
>>>> B in FilterList with AND and if A is not including a cell, we were not
>>>> passing that to B?   (In 1.2  I mean)  and in later versions we start
>>>> passing it?
>>>>
>>>> U said
>>>> >Oh, change my mind now, it's not bug for branch-1.4, see below:
>>>>
>>>> Now u changed ur mind back la?  :-)
>>>>
>>>>
>>>> I think it is a bug only
>>>> The Pagination filter says from offset 0 (means 1st cell in row)  we
>>>> need 5 cells.  Need not be consecutive cells.   Say if we have another
>>>> filter which might filter out some in between cells, then also we need
>>>> to have 5 cells to be included.
>>>>
>>>> -Anoop-
>>>>
>>>> On Tue, May 22, 2018 at 5:26 PM, OpenInx <open...@gmail.com> wrote:
>>>> > The offset is a trouble for FilterList.
>>>> >
>>>> > IMO, we can still keep the max rule  (or min rule) for
>>>> > FilterListWithAND
>>>> > (or FilterListWithOR),  but the offset in ColumnPaginationFilter  need
>>>> > to
>>>> > be removed (Only this filter has offset )
>>>> >
>>>> > .. Will change the public API ...
>>>> >
>>>> > On Tue, May 22, 2018 at 7:49 PM, Anoop John <anoop.hb...@gmail.com>
>>>> > wrote:
>>>> >>
>>>> >> Sorry for the delay.. In fact today morning I took that jira.  :-)
>>>> >> Read the discussion there and trying to read the code in 2 filters.
>>>> >> Seems we messed up with some of the individual filters!     BTW it is
>>>> >> strange that am not getting any notify mail from jira when some one
>>>> >> refer me using @..
>>>> >>
>>>> >> -Anoop-
>>>> >>
>>>> >> On Tue, May 22, 2018 at 4:07 PM, OpenInx <open...@gmail.com> wrote:
>>>> >> > Hi anoop:
>>>> >> >
>>>> >> >
>>>> >> > I guess you are busy those days ...
>>>> >> >
>>>> >> > If you have any time,  please take a look at HBASE-20565 .
>>>> >> >
>>>> >> >
>>>> >> > Really appreciate  :-)
>>>> >> >
>>>> >> >

> ColumnRangeFilter combined with ColumnPaginationFilter can produce incorrect 
> result since 1.4
> ---------------------------------------------------------------------------------------------
>
>                 Key: HBASE-20565
>                 URL: https://issues.apache.org/jira/browse/HBASE-20565
>             Project: HBase
>          Issue Type: Bug
>          Components: Filters
>    Affects Versions: 2.1.0, 1.4.4, 2.0.0
>            Reporter: Jerry He
>            Assignee: Zheng Hu
>            Priority: Major
>             Fix For: 2.1.0, 1.5.0, 1.4.6, 2.0.2
>
>         Attachments: HBASE-20565.v1.patch, debug.diff, debug.log, 
> test-branch-1.4.patch
>
>
> When ColumnPaginationFilter is combined with ColumnRangeFilter, we may see 
> incorrect result.
> Here is a simple example.
> One row with 10 columns c0, c1, c2, .., c9.  I have a ColumnRangeFilter for 
> range c2 to c9.  Then I have a ColumnPaginationFilter with limit 5 and offset 
> 0.  FileterList is FilterList(Operator.MUST_PASS_ALL, ColumnRangeFilter, 
> ColumnPaginationFilter).
> We expect 5 columns being returned.  But in HBase 1.4 and after, 4 columns 
> are returned.
> In 1.2.x, the correct 5 columns are returned.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to