[
https://issues.apache.org/jira/browse/HBASE-21332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661820#comment-16661820
]
pddNick commented on HBASE-21332:
---------------------------------
Thank u so much , it helps me a lot.
'+PageFilter isn't a global filter which consider the cross regions case, it
only consider the region level+'. This is the thing that matters.
But{color:#d04437} it causes problem only when scanner crosses region.{color}
{color:#333333}Say now the scanner is in [111, 222) and there are plenty of
rows to retrieve which are more than PAGE_SIZE.The scanner will get PAGE_SIZE
rows and returns them to client , everything works just fine.{color}
{color:#333333}But when there are not enough rows left in [111, 222).The
scanner will combine all rows of parallel scanner in region [222, 333) and
[333, 444) and return them to client.That means the client will get
(RowsLeftInFirstRegion + PAGE_SIZE + PAGE_SIZE).In this way the clinet skips
region [222,333) while the last rowkey is in [333,444).I add some log to the UT
and the output is exactly what i expect.{color}
{color:#333333}So the solution is to get PAGE_SIZE rows by ourselves every time
we do scan, or to limit scan range in single region(this is what i do in my
real life project).{color}
> HBase scan with PageFilter cannot get all rows, non-edge region skiped
> ----------------------------------------------------------------------
>
> Key: HBASE-21332
> URL: https://issues.apache.org/jira/browse/HBASE-21332
> Project: HBase
> Issue Type: Bug
> Components: regionserver, scan
> Affects Versions: 1.1.2
> Environment: * Server version:1.1.2.2.6.5.0-292,
> revision=897822d4dd5956ca186974c10382e9094683fa29
> * 2 region servers
> * 4 regions
> * HBase client:1.3.1
>
> Reporter: pddNick
> Assignee: Zheng Hu
> Priority: Minor
> Attachments: HBaseTest.java, image-2018-10-17-21-14-25-354.png,
> image-2018-10-17-21-15-23-439.png, image-2018-10-23-17-37-22-028.png
>
>
> When using scan with pagefilter to get data from hbase, the scanner will
> skip{color:#ff0000} 'non-edge'{color} regions.The code i use comes from the
> book _HBase: Definitive Guide, Example 4.8, PageFilter example._ Difference
> is i use scan with startRow and stopRow.
> Say i have regions with start and end keys like \{'111', '222', '333',
> '444'}, which means i have 3 regions \{111, 222}, \{222, 333}, \{333, 444}
> and they are in different region servers. When scan with startRow '111' and
> stopRow '444' , most data in region \{222, 333} will be skiped and won't be
> returned by ResultScanner.Region \{111,222} or \{333,444} works just fine and
> because region \{222,333} doesn't contain startRowkey or stopRowkey i call it
> non-edge region.
> Below is some explanation with log:
>
> {code:java}
> // Here scanner works just fine in region {111,222}, it gets exactly
> {pageSize} rows each time, which is 1000
> ...
> 2018-10-17 21:25:57.810 INFO 213872 [ main] c.p.s.c.HBaseTest : Test: results
> from [2139718600001069] to [2179067497952422], sum [1000 : 64000], cost:
> [77ms]
> 2018-10-17 21:25:57.885 INFO 213872 [ main] c.p.s.c.HBaseTest : Test: results
> from [2179098921079755] to [21c2879280113661], sum [1000 : 65000], cost:
> [75ms]
> 2018-10-17 21:25:57.962 INFO 213872 [ main] c.p.s.c.HBaseTest : Test: results
> from [21c2899018774688] to [2203180876471552], sum [1000 : 66000], cost:
> [77ms]
> // Here scanner goes from region {111,222} to {222,333}. As you can see, the
> scanner gets 2405 rows with stopRow '3373621463365126'.The scanner moves to
> regin {333,444} too early and most data in {222,333} are skiped.
> 2018-10-17 21:25:58.321 INFO 213872 [ main] c.p.s.c.HBaseTest : Test: results
> from [2203223414254308] to [3373621463365126], sum [2405 : 68405], cost:
> [359ms]
> // Now the scanner is in region {333,444}, everything works just fine
> 2018-10-17 21:25:58.396 INFO 213872 [ main] c.p.s.c.HBaseTest : Test: results
> from [3373764408525604] to [33b3849714659525], sum [1000 : 69405], cost:
> [74ms]
> 2018-10-17 21:25:58.467 INFO 213872 [ main] c.p.s.c.HBaseTest : Test: results
> from [33b3882378177107] to [33f5221377695765], sum [1000 : 70405], cost:
> [71ms]
> ...{code}
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)