[
https://issues.apache.org/jira/browse/HBASE-23370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Junegunn Choi updated HBASE-23370:
----------------------------------
Description:
I'm aware that the latest version of HBase has {{Scan#setLimit}} and it should
nicely replace PageFilter in most use cases. However, I'd like to point out
that the filter behaves strangely in the following scenario.
Let's say we have a table with 10 regions, and each region holds 100 records.
{code:ruby}
create 'page-filter', 'd', SPLITS => (1..9).map(&:to_s)
1000.times.each { |i| put 'page-filter', format('%04d', i).reverse, 'd:foo',
'bar' }
{code}
And if I scan the table with {{PageFilter(30)}}, I'd expect to see only 30
records. While {{PageFilter}} does not guarantee that the number of the
returned records is smaller than the specified size, we have more than 30
records in the first region, so the page will be filled and the filter should
immediately terminate the scan.
{code:ruby}
scan 'page-filter', FILTER => 'PageFilter(30)'
{code}
However, this returns 300 records; 30 records from the beginning of every
region. The client keeps advancing to the next region when it shouldn't, and
it's because of {{results.isEmpty()}} condition in the following code:
[https://github.com/apache/hbase/blob/12c19a6e5105d898e93e385e0cded5eabceb8a40/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RSRpcServices.java#L3552-L3558]
I can confirm that removing the condition fixes the issue. Is the comment
"_This is used to keep compatible with the old scan implementation_" still
valid?
I'll upload a patch to see how it affects the existing test cases.
was:
I'm aware that the latest version of HBase has {{Scan#setLimit}} and it should
nicely replace PageFilter in most use cases. However, I'd like to point out
that the filter behaves strangely in the following scenario.
Let's say we have a table with 10 regions, and each region holds 100 records.
{code:ruby}
create 'page-filter', 'd', SPLITS => (1..9).map(&:to_s)
1000.times.each { |i| put 'page-filter', format('%04d', i).reverse, 'd:foo',
'bar' }
{code}
And if I scan the table with {{PageFilter(30)}}, I'd expect to see only 30
records. While {{PageFilter}} does not guarantee that the number of the
returned records is smaller than the specified size, we have more than 30
records in the first region, so the page will be filled and the filter should
immediately terminate the scan.
{code:ruby}
scan 'page-filter', FILTER => 'PageFilter(30)'
{code}
However, this returns 300 records, 30 records from the beginning of each
region. The client keeps advancing to the next region when it shouldn't, and
it's because of {{results.isEmpty()}} condition in the following code:
[https://github.com/apache/hbase/blob/12c19a6e5105d898e93e385e0cded5eabceb8a40/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RSRpcServices.java#L3552-L3558]
I can confirm that removing the condition fixes the issue. Is the comment
"_This is used to keep compatible with the old scan implementation_" still
valid?
I'll upload a patch to see how it affects the existing test cases.
> PageFilter returns extra records even when page is filled within a region
> -------------------------------------------------------------------------
>
> Key: HBASE-23370
> URL: https://issues.apache.org/jira/browse/HBASE-23370
> Project: HBase
> Issue Type: Bug
> Affects Versions: 3.0.0
> Reporter: Junegunn Choi
> Assignee: Junegunn Choi
> Priority: Minor
> Attachments: HBASE-23370.patch
>
>
> I'm aware that the latest version of HBase has {{Scan#setLimit}} and it
> should nicely replace PageFilter in most use cases. However, I'd like to
> point out that the filter behaves strangely in the following scenario.
> Let's say we have a table with 10 regions, and each region holds 100 records.
> {code:ruby}
> create 'page-filter', 'd', SPLITS => (1..9).map(&:to_s)
> 1000.times.each { |i| put 'page-filter', format('%04d', i).reverse, 'd:foo',
> 'bar' }
> {code}
> And if I scan the table with {{PageFilter(30)}}, I'd expect to see only 30
> records. While {{PageFilter}} does not guarantee that the number of the
> returned records is smaller than the specified size, we have more than 30
> records in the first region, so the page will be filled and the filter should
> immediately terminate the scan.
> {code:ruby}
> scan 'page-filter', FILTER => 'PageFilter(30)'
> {code}
> However, this returns 300 records; 30 records from the beginning of every
> region. The client keeps advancing to the next region when it shouldn't, and
> it's because of {{results.isEmpty()}} condition in the following code:
> [https://github.com/apache/hbase/blob/12c19a6e5105d898e93e385e0cded5eabceb8a40/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RSRpcServices.java#L3552-L3558]
> I can confirm that removing the condition fixes the issue. Is the comment
> "_This is used to keep compatible with the old scan implementation_" still
> valid?
> I'll upload a patch to see how it affects the existing test cases.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)