[ 
https://issues.apache.org/jira/browse/HBASE-5104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13269115#comment-13269115
 ] 

Phabricator commented on HBASE-5104:
------------------------------------

mbautin has commented on the revision "[jira] [HBASE-5104] Provide a reliable 
intra-row pagination mechanism".

  Michael, Jimmy: thanks for reviewing! See my responses inline.

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/protobuf/RequestConverter.java:386 Done.

  src/main/java/org/apache/hadoop/hbase/protobuf/RequestConverter.java:387 Done.
  src/main/java/org/apache/hadoop/hbase/protobuf/RequestConverter.java:931 Done.
  src/main/java/org/apache/hadoop/hbase/protobuf/RequestConverter.java:932 Done.
  src/main/java/org/apache/hadoop/hbase/client/Scan.java:638 Done.
  src/main/java/org/apache/hadoop/hbase/client/Get.java:471 Done.
  src/main/protobuf/Client.proto:49 Done.
  src/main/protobuf/Client.proto:50 Done.
  src/main/protobuf/Client.proto:199 Done.
  src/main/protobuf/Client.proto:200 Done.
  src/test/java/org/apache/hadoop/hbase/HTestConst.java:18 This is not a test, 
this is a collection of constants used in tests.

  I tried to save some typing, because the intended usage pattern is 
HTestConst.DEFAULT_{TABLE,CF,ROW,etc}... However, if you feel strongly about 
it, I can rename it to HTestConstants.

  src/test/java/org/apache/hadoop/hbase/client/TestIntraRowPagination.java:60 
Added region.close(). I am assuming that takes care of closing the HLog 
(correct me if I'm wrong).
  src/main/java/org/apache/hadoop/hbase/client/Get.java:212 Yes, this offset is 
only within a particular (row, CF) combination. It gets reset back to zero when 
we move to the next row/CF. Added this to javadoc.
  src/main/java/org/apache/hadoop/hbase/client/Result.java:177 Got rid of this 
method.

REVISION DETAIL
  https://reviews.facebook.net/D2799

                
> Provide a reliable intra-row pagination mechanism
> -------------------------------------------------
>
>                 Key: HBASE-5104
>                 URL: https://issues.apache.org/jira/browse/HBASE-5104
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Kannan Muthukkaruppan
>            Assignee: Madhuwanti Vaidya
>         Attachments: D2799.1.patch, D2799.2.patch, D2799.3.patch, 
> D2799.4.patch, D2799.5.patch, 
> jira-HBASE-5104-Provide-a-reliable-intra-row-paginat-2012-04-16_12_39_42.patch,
>  testFilterList.rb
>
>
> Addendum:
> Doing pagination (retrieving at most "limit" number of KVs at a particular 
> "offset") is currently supported via the ColumnPaginationFilter. However, it 
> is not a very clean way of supporting pagination.  Some of the problems with 
> it are:
> * Normally, one would expect a query with (Filter(A) AND Filter(B)) to have 
> same results as (query with Filter(A)) INTERSECT (query with Filter(B)). This 
> is not the case for ColumnPaginationFilter as its internal state gets updated 
> depending on whether or not Filter(A) returns TRUE/FALSE for a particular 
> cell.
> * When this Filter is used in combination with other filters (e.g., doing AND 
> with another filter using FilterList), the behavior of the query depends on 
> the order of filters in the FilterList. This is not ideal.
> * ColumnPaginationFilter is a stateful filter which ends up counting multiple 
> versions of the cell as separate values even if another filter upstream or 
> the ScanQueryMatcher is going to reject the value for other reasons.
> Seems like we need a reliable way to do pagination. The particular use case 
> that prompted this JIRA is pagination within the same rowKey. For example, 
> for a given row key R, get columns with prefix P, starting at offset X (among 
> columns which have prefix P) and limit Y. Some possible fixes might be:
> 1) enhance ColumnPrefixFilter to support another constructor which supports 
> limit/offset.
> 2) Support pagination (limit/offset) at the Scan/Get API level (rather than 
> as a filter) [Like SQL].
> Original Post:
> Thanks Jiakai Liu for reporting this issue and doing the initial 
> investigation. Email from Jiakai below:
> Assuming that we have an index column family with the following entries:
> "tag0:001:thread1"
> ...
> "tag1:001:thread1"
> "tag1:002:thread2"
> ...
> "tag1:010:thread10"
> ...
> "tag2:001:thread1"
> "tag2:005:thread5"
> ...
> To get threads with "tag1" in range [5, 10), I tried the following code:
>     ColumnPrefixFilter filter1 = new 
> ColumnPrefixFilter(Bytes.toBytes("tag1"));
>     ColumnPaginationFilter filter2 = new ColumnPaginationFilter(5 /* limit 
> */, 5 /* offset */);
>     FilterList filters = new FilterList(Operator.MUST_PASS_ALL);
>     filters.addFilter(filter1);
>     filters.addFilter(filter2);
>     Get get = new Get(USER);
>     get.addFamily(COLUMN_FAMILY);
>     get.setMaxVersions(1);
>     get.setFilter(filters);
> Somehow it didn't work as expected. It returned the entries as if the filter1 
> were not set.
> Turns out the ColumnPrefixFilter returns SEEK_NEXT_USING_HINT in some cases. 
> The FilterList filter does not handle this return code properly (treat it as 
> INCLUDE).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to