[ 
https://issues.apache.org/jira/browse/HBASE-23370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junegunn Choi updated HBASE-23370:
----------------------------------
    Attachment: HBASE-23370-v2.patch

> PageFilter returns extra records even when page is filled within a region
> -------------------------------------------------------------------------
>
>                 Key: HBASE-23370
>                 URL: https://issues.apache.org/jira/browse/HBASE-23370
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 3.0.0
>            Reporter: Junegunn Choi
>            Assignee: Junegunn Choi
>            Priority: Minor
>         Attachments: HBASE-23370-v2.patch, HBASE-23370.patch
>
>
> I'm aware that the latest version of HBase has {{Scan#setLimit}} and it 
> should nicely replace PageFilter in most use cases. However, I'd like to 
> point out that the filter behaves strangely in the following scenario.
> Let's say we have a table with 10 regions, and each region holds 100 records.
> {code:ruby}
> create 'page-filter', 'd', SPLITS => (1..9).map(&:to_s)
> 1000.times.each { |i| put 'page-filter', format('%04d', i).reverse, 'd:foo', 
> 'bar' }
> {code}
> And if I scan the table with {{PageFilter(30)}}, I'd expect to see only 30 
> records. While {{PageFilter}} does not guarantee that the number of the 
> returned records is smaller than the specified size, we have more than 30 
> records in the first region, so the page will be filled and the filter should 
> immediately terminate the scan.
> {code:ruby}
> scan 'page-filter', FILTER => 'PageFilter(30)'
> {code}
> However, this returns 300 records; 30 records from the beginning of every 
> region. The client keeps advancing to the next region when it shouldn't, and 
> it's because of {{results.isEmpty()}} condition in the following code:
> [https://github.com/apache/hbase/blob/12c19a6e5105d898e93e385e0cded5eabceb8a40/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RSRpcServices.java#L3552-L3558]
> I can confirm that removing the condition fixes the issue. Is the comment 
> "_This is used to keep compatible with the old scan implementation_" still 
> valid?
> I'll upload a patch to see how it affects the existing test cases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to