[
https://issues.apache.org/jira/browse/HBASE-5104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13180201#comment-13180201
]
Lars Hofhansl commented on HBASE-5104:
--------------------------------------
Thanks Stack. I see, so ColumnPaginationFilter cannot work if we want to cross
rows. I think what Kannan has in mind is pagination within a given row (from
the description).
Having more precise control of scanner start and stop cell might be nice
anyway. I just had a discussion today about how it would be nice if one could
start a scanner at a certain column prefix within a certain row and also set a
stop column prefix with in a row. (i.e. not using a filter). It seems this
would be generally applicable and also solve Kannan's use case. Correct Kannan?
Something like Scan.setStartRow(byte[] rowkey, byte[] column), which would
request to seek the scanner to that exact column prefix (while honoring all
version settings for the scanner)... Same for setStopRow(byte[] rowkey, byte[]
column).
> Provide a reliable pagination mechanism
> ---------------------------------------
>
> Key: HBASE-5104
> URL: https://issues.apache.org/jira/browse/HBASE-5104
> Project: HBase
> Issue Type: Bug
> Reporter: Kannan Muthukkaruppan
> Assignee: Madhuwanti Vaidya
> Attachments: testFilterList.rb
>
>
> Addendum:
> Doing pagination (retrieving at most "limit" number of KVs at a particular
> "offset") is currently supported via the ColumnPaginationFilter. However, it
> is nota very clean way of supporting pagination. Some of the problems with
> it are:
> * Normally, one would expect a query with (Filter(A) AND Filter(B)) to have
> same results as (query with Filter(A)) INTERSECT (query with Filter(B)). This
> is not the case for ColumnPaginationFilter as its internal state gets updated
> depending on whether or not Filter(A) returns TRUE/FALSE for a particular
> cell.
> * When this Filter is used in combination with other filters (e.g., doing AND
> with another filter using FilterList), the behavior of the query depends on
> the order of filters in the FilterList. This is not ideal.
> * ColumnPaginationFilter is a stateful filter which ends up counting multiple
> versions of the cell as separate values even if another filter upstream or
> the ScanQueryMatcher is going to reject the value for other reasons.
> Seems like we need a reliable way to do pagination. The particular use case
> that prompted this JIRA is pagination within the same rowKey. For example,
> for a given row key R, get columns with prefix P, starting at offset X (among
> columns which have prefix P) and limit Y. Some possible fixes might be:
> 1) enhance ColumnPrefixFilter to support another constructor which supports
> limit/offset.
> 2) Support pagination (limit/offset) at the Scan/Get API level (rather than
> as a filter) [Like SQL].
> Original Post:
> Thanks Jiakai Liu for reporting this issue and doing the initial
> investigation. Email from Jiakai below:
> Assuming that we have an index column family with the following entries:
> "tag0:001:thread1"
> ...
> "tag1:001:thread1"
> "tag1:002:thread2"
> ...
> "tag1:010:thread10"
> ...
> "tag2:001:thread1"
> "tag2:005:thread5"
> ...
> To get threads with "tag1" in range [5, 10), I tried the following code:
> ColumnPrefixFilter filter1 = new
> ColumnPrefixFilter(Bytes.toBytes("tag1"));
> ColumnPaginationFilter filter2 = new ColumnPaginationFilter(5 /* limit
> */, 5 /* offset */);
> FilterList filters = new FilterList(Operator.MUST_PASS_ALL);
> filters.addFilter(filter1);
> filters.addFilter(filter2);
> Get get = new Get(USER);
> get.addFamily(COLUMN_FAMILY);
> get.setMaxVersions(1);
> get.setFilter(filters);
> Somehow it didn't work as expected. It returned the entries as if the filter1
> were not set.
> Turns out the ColumnPrefixFilter returns SEEK_NEXT_USING_HINT in some cases.
> The FilterList filter does not handle this return code properly (treat it as
> INCLUDE).
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira