[ 
https://issues.apache.org/jira/browse/HBASE-5104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13177405#comment-13177405
 ] 

Kannan Muthukkaruppan commented on HBASE-5104:
----------------------------------------------

Additional note: The use of ColumnPaginationFilter AND ColumnPrefixFilter for 
the intended use case (i.e. to get next set of 5 thread ids for tag2) probably 
will not work even after this bug is fixed. I think once the bug is fixed for 
the above data set, the program should return 0 kvs.

Here's why:
  (select * from Tab where filter1 and filter2)
should be same as:
  (select * from Tab where filter1) INTERSECT (select * from Tab where filter2)

When separately applied, filter1 will return rows with tag1 prefix and filter2 
will return rows with tag0 prefix (for Jiakai's example above) and the 
INTERSECTION will be the empty set.

The real confusion here seems to be because of the use of a filter for 
pagination. This seems odd. In normal SQL for example, pagination is not part 
of the WHERE clause but a separate special clause (as if it was being applied 
on the results of a sub-query).

 (select * from Tab 
  where  column  LIKE 'tag1%"
  LIMIT 5 OFFSET 5)

Possible ways of supporting this use case:

1) Don't use AND (via FilterList), but enhance ColumnPrefixFilter to support 
another constructor which supports limit/offset.
2) Support pagination (limit/offset) at the Scan/Get API level (rather than as 
a filter) [Like SQL].

Thoughts?





                
> FilterList doesn't work right with filters (such as ColumPrefixFilter) which 
> use the SEEK_NEXT_USING_HINT
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5104
>                 URL: https://issues.apache.org/jira/browse/HBASE-5104
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Kannan Muthukkaruppan
>            Assignee: Madhuwanti Vaidya
>         Attachments: testFilterList.rb
>
>
> Thanks Jiakai Liu for reporting this issue and doing the initial 
> investigation. Email from Jiakai below:
> Assuming that we have an index column family with the following entries:
> "tag0:001:thread1"
> ...
> "tag1:001:thread1"
> "tag1:002:thread2"
> ...
> "tag1:010:thread10"
> ...
> "tag2:001:thread1"
> "tag2:005:thread5"
> ...
> To get threads with "tag1" in range [5, 10), I tried the following code:
>     ColumnPrefixFilter filter1 = new 
> ColumnPrefixFilter(Bytes.toBytes("tag1"));
>     ColumnPaginationFilter filter2 = new ColumnPaginationFilter(5 /* limit 
> */, 5 /* offset */);
>     FilterList filters = new FilterList(Operator.MUST_PASS_ALL);
>     filters.addFilter(filter1);
>     filters.addFilter(filter2);
>     Get get = new Get(USER);
>     get.addFamily(COLUMN_FAMILY);
>     get.setMaxVersions(1);
>     get.setFilter(filters);
> Somehow it didn't work as expected. It returned the entries as if the filter1 
> were not set.
> Turns out the ColumnPrefixFilter returns SEEK_NEXT_USING_HINT in some cases. 
> The FilterList filter does not handle this return code properly (treat it as 
> INCLUDE).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to