[ https://issues.apache.org/jira/browse/HBASE-5104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Phabricator updated HBASE-5104: ------------------------------- Attachment: D2799.3.patch mbautin updated the revision "[jira] [HBASE-5104] Provide a reliable intra-row pagination mechanism". Reviewers: madhuvaidya, lhofhansl, Kannan, tedyu, stack, todd, JIRA Fixing a bug with StoreScanner not resetting the current offset within the row/column family (countPerRow) to zero when transitioning to the next row. This is a porting error that is not present in Madhu's original fix for 89-fb. Also adding an independent test based on HRegion API that helped catch this bug (TestIntraRowPagination). I will post additional test results later. REVISION DETAIL https://reviews.facebook.net/D2799 AFFECTED FILES src/main/java/org/apache/hadoop/hbase/client/Get.java src/main/java/org/apache/hadoop/hbase/client/Result.java src/main/java/org/apache/hadoop/hbase/client/Scan.java src/main/java/org/apache/hadoop/hbase/protobuf/ProtobufUtil.java src/main/java/org/apache/hadoop/hbase/protobuf/RequestConverter.java src/main/java/org/apache/hadoop/hbase/protobuf/generated/ClientProtos.java src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java src/main/protobuf/Client.proto src/test/java/org/apache/hadoop/hbase/HTestConst.java src/test/java/org/apache/hadoop/hbase/client/TestIntraRowPagination.java src/test/java/org/apache/hadoop/hbase/client/TestScannersFromClientSide.java > Provide a reliable intra-row pagination mechanism > ------------------------------------------------- > > Key: HBASE-5104 > URL: https://issues.apache.org/jira/browse/HBASE-5104 > Project: HBase > Issue Type: Bug > Reporter: Kannan Muthukkaruppan > Assignee: Madhuwanti Vaidya > Attachments: D2799.1.patch, D2799.2.patch, D2799.3.patch, > testFilterList.rb > > > Addendum: > Doing pagination (retrieving at most "limit" number of KVs at a particular > "offset") is currently supported via the ColumnPaginationFilter. However, it > is not a very clean way of supporting pagination. Some of the problems with > it are: > * Normally, one would expect a query with (Filter(A) AND Filter(B)) to have > same results as (query with Filter(A)) INTERSECT (query with Filter(B)). This > is not the case for ColumnPaginationFilter as its internal state gets updated > depending on whether or not Filter(A) returns TRUE/FALSE for a particular > cell. > * When this Filter is used in combination with other filters (e.g., doing AND > with another filter using FilterList), the behavior of the query depends on > the order of filters in the FilterList. This is not ideal. > * ColumnPaginationFilter is a stateful filter which ends up counting multiple > versions of the cell as separate values even if another filter upstream or > the ScanQueryMatcher is going to reject the value for other reasons. > Seems like we need a reliable way to do pagination. The particular use case > that prompted this JIRA is pagination within the same rowKey. For example, > for a given row key R, get columns with prefix P, starting at offset X (among > columns which have prefix P) and limit Y. Some possible fixes might be: > 1) enhance ColumnPrefixFilter to support another constructor which supports > limit/offset. > 2) Support pagination (limit/offset) at the Scan/Get API level (rather than > as a filter) [Like SQL]. > Original Post: > Thanks Jiakai Liu for reporting this issue and doing the initial > investigation. Email from Jiakai below: > Assuming that we have an index column family with the following entries: > "tag0:001:thread1" > ... > "tag1:001:thread1" > "tag1:002:thread2" > ... > "tag1:010:thread10" > ... > "tag2:001:thread1" > "tag2:005:thread5" > ... > To get threads with "tag1" in range [5, 10), I tried the following code: > ColumnPrefixFilter filter1 = new > ColumnPrefixFilter(Bytes.toBytes("tag1")); > ColumnPaginationFilter filter2 = new ColumnPaginationFilter(5 /* limit > */, 5 /* offset */); > FilterList filters = new FilterList(Operator.MUST_PASS_ALL); > filters.addFilter(filter1); > filters.addFilter(filter2); > Get get = new Get(USER); > get.addFamily(COLUMN_FAMILY); > get.setMaxVersions(1); > get.setFilter(filters); > Somehow it didn't work as expected. It returned the entries as if the filter1 > were not set. > Turns out the ColumnPrefixFilter returns SEEK_NEXT_USING_HINT in some cases. > The FilterList filter does not handle this return code properly (treat it as > INCLUDE). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira