[ 
https://issues.apache.org/jira/browse/HBASE-21332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661826#comment-16661826
 ] 

pddNick commented on HBASE-21332:
---------------------------------

Conclusion:

When crossing region, the scanner with page filter will combine all partial 
results of all regions covered by start key and end key which makes results 
bigger than page size.Get rows of page size in manual way to avoid 'non-edge' 
skipped problem.

[~openinx]

> HBase scan with PageFilter cannot get all rows, non-edge region skiped
> ----------------------------------------------------------------------
>
>                 Key: HBASE-21332
>                 URL: https://issues.apache.org/jira/browse/HBASE-21332
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver, scan
>    Affects Versions: 1.1.2
>         Environment: * Server version:1.1.2.2.6.5.0-292, 
> revision=897822d4dd5956ca186974c10382e9094683fa29
>  * 2 region servers
>  * 4 regions
>  * HBase client:1.3.1
>  
>            Reporter: pddNick
>            Assignee: Zheng Hu
>            Priority: Minor
>         Attachments: HBaseTest.java, image-2018-10-17-21-14-25-354.png, 
> image-2018-10-17-21-15-23-439.png, image-2018-10-23-17-37-22-028.png
>
>
> When using scan with pagefilter to get data from hbase, the scanner will 
> skip{color:#ff0000} 'non-edge'{color} regions.The code i use comes from the 
> book _HBase: Definitive Guide, Example 4.8, PageFilter example._ Difference 
> is i use scan with startRow and stopRow.
> Say i have regions with start and end keys like \{'111', '222', '333', 
> '444'}, which means i have 3 regions \{111, 222}, \{222, 333}, \{333, 444} 
> and they are in different region servers. When scan with startRow '111' and 
> stopRow '444' , most data in region \{222, 333} will be skiped and won't be 
> returned by ResultScanner.Region \{111,222} or \{333,444} works just fine and 
> because region \{222,333} doesn't contain startRowkey or stopRowkey i call it 
> non-edge region.
> Below is some explanation with log:
>  
> {code:java}
> // Here scanner works just fine in region {111,222}, it gets exactly 
> {pageSize} rows each time, which is 1000
> ...
> 2018-10-17 21:25:57.810 INFO 213872 [ main] c.p.s.c.HBaseTest : Test: results 
> from [2139718600001069] to [2179067497952422], sum [1000 : 64000], cost: 
> [77ms]
> 2018-10-17 21:25:57.885 INFO 213872 [ main] c.p.s.c.HBaseTest : Test: results 
> from [2179098921079755] to [21c2879280113661], sum [1000 : 65000], cost: 
> [75ms]
> 2018-10-17 21:25:57.962 INFO 213872 [ main] c.p.s.c.HBaseTest : Test: results 
> from [21c2899018774688] to [2203180876471552], sum [1000 : 66000], cost: 
> [77ms]
> // Here scanner goes from region {111,222} to {222,333}. As you can see, the 
> scanner gets 2405 rows with stopRow '3373621463365126'.The scanner moves to 
> regin {333,444} too early and most data in {222,333} are skiped.
> 2018-10-17 21:25:58.321 INFO 213872 [ main] c.p.s.c.HBaseTest : Test: results 
> from [2203223414254308] to [3373621463365126], sum [2405 : 68405], cost: 
> [359ms]
> // Now the scanner is in region {333,444}, everything works just fine
> 2018-10-17 21:25:58.396 INFO 213872 [ main] c.p.s.c.HBaseTest : Test: results 
> from [3373764408525604] to [33b3849714659525], sum [1000 : 69405], cost: 
> [74ms]
> 2018-10-17 21:25:58.467 INFO 213872 [ main] c.p.s.c.HBaseTest : Test: results 
> from [33b3882378177107] to [33f5221377695765], sum [1000 : 70405], cost: 
> [71ms]
> ...{code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to