Junegunn Choi created HBASE-23370:
-------------------------------------

             Summary: PageFilter returns extra records even when page is filled 
within a region
                 Key: HBASE-23370
                 URL: https://issues.apache.org/jira/browse/HBASE-23370
             Project: HBase
          Issue Type: Bug
    Affects Versions: 3.0.0
            Reporter: Junegunn Choi


I'm aware that the latest version of HBase has {{Scan#setLimit}} and it should 
nicely replace PageFilter in most use cases. However, I'd like to point out 
that the filter behaves strangely in the following scenario.

Let's say we have a table with 10 regions, and each region holds 100 records.
{code:ruby}
create 'page-filter', 'd', SPLITS => (1..9).map(&:to_s)
1000.times.each { |i| put 'page-filter', format('%04d', i).reverse, 'd:foo', 
'bar' }
{code}
And if I scan the table with {{PageFilter(30)}}, I'd expect to see only 30 
records. While {{PageFilter}} does not guarantee that the number of the 
returned records is smaller than the specified size, we have more than 30 
records in the first region, so the page will be filled and the filter should 
immediately terminate the scan.
{code:ruby}
scan 'page-filter', FILTER => 'PageFilter(30)'
{code}
However, this returns 300 records, 30 records from the beginning of each 
region. The client keeps advancing to the next region when it shouldn't, and 
it's because of {{results.isEmpty()}} condition in the following code:

[https://github.com/apache/hbase/blob/12c19a6e5105d898e93e385e0cded5eabceb8a40/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RSRpcServices.java#L3552-L3558]

I can confirm that removing the condition fixes the issue. Is the comment 
"_This is used to keep compatible with the old scan implementation_" still 
valid?

I'll upload a patch to see how it affects the existing test cases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to