[ 
https://issues.apache.org/jira/browse/HBASE-21332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661913#comment-16661913
 ] 

Zheng Hu edited comment on HBASE-21332 at 10/24/18 8:15 AM:
------------------------------------------------------------

bq. While the region scanners in HBase server are parallel i think there is no 
'switch' during scan
What do you mean parallel ?  if a scan across multiple regions,  it need a 
switch between them.  you can see the ClientScanner#moveToNextRegion. 

bq. Why would HBase page filter act like this
Because the filter is designed based on one single region.  HBase has no way to 
maitain the filter for the global cluster.  Once the scan switch to a new 
region, all filters are totally new filters, So for some filters which need 
maitain a global state (such as PageFilter's rowsAccepted), it's will be 
confusing for users. As you said,  PageFiler is indeed so weird... In fact, I 
would recommend that don't use the PageFilter. 
 


was (Author: openinx):
bq. While the region scanners in HBase server are parallel i think there is no 
'switch' during scan
What do you mean parallel ?  if a scan across multiple regions,  it need a 
switch between them.  you can see the ClientScanner#moveToNextRegion. 

bq. Why would HBase page filter act like this
Because the filter is designed only for one single region.  HBase has no way to 
maitain the filter for the global cluster.  Once the scan switch to a new 
region, all filters are totally new filters, So for some filters which need 
maitain a global state (such as PageFilter's rowsAccepted), it's will be 
confusing for users. As you said,  PageFiler is indeed so weird... In fact, I 
would recommend that don't use the PageFilter. 
 

> HBase scan with PageFilter cannot get all rows, non-edge region skiped
> ----------------------------------------------------------------------
>
>                 Key: HBASE-21332
>                 URL: https://issues.apache.org/jira/browse/HBASE-21332
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver, scan
>    Affects Versions: 1.1.2
>         Environment: * Server version:1.1.2.2.6.5.0-292, 
> revision=897822d4dd5956ca186974c10382e9094683fa29
>  * 2 region servers
>  * 4 regions
>  * HBase client:1.3.1
>  
>            Reporter: pddNick
>            Assignee: Zheng Hu
>            Priority: Minor
>         Attachments: HBaseTest.java, image-2018-10-17-21-14-25-354.png, 
> image-2018-10-17-21-15-23-439.png, image-2018-10-23-17-37-22-028.png
>
>
> When using scan with pagefilter to get data from hbase, the scanner will 
> skip{color:#ff0000} 'non-edge'{color} regions.The code i use comes from the 
> book _HBase: Definitive Guide, Example 4.8, PageFilter example._ Difference 
> is i use scan with startRow and stopRow.
> Say i have regions with start and end keys like \{'111', '222', '333', 
> '444'}, which means i have 3 regions \{111, 222}, \{222, 333}, \{333, 444} 
> and they are in different region servers. When scan with startRow '111' and 
> stopRow '444' , most data in region \{222, 333} will be skiped and won't be 
> returned by ResultScanner.Region \{111,222} or \{333,444} works just fine and 
> because region \{222,333} doesn't contain startRowkey or stopRowkey i call it 
> non-edge region.
> Below is some explanation with log:
>  
> {code:java}
> // Here scanner works just fine in region {111,222}, it gets exactly 
> {pageSize} rows each time, which is 1000
> ...
> 2018-10-17 21:25:57.810 INFO 213872 [ main] c.p.s.c.HBaseTest : Test: results 
> from [2139718600001069] to [2179067497952422], sum [1000 : 64000], cost: 
> [77ms]
> 2018-10-17 21:25:57.885 INFO 213872 [ main] c.p.s.c.HBaseTest : Test: results 
> from [2179098921079755] to [21c2879280113661], sum [1000 : 65000], cost: 
> [75ms]
> 2018-10-17 21:25:57.962 INFO 213872 [ main] c.p.s.c.HBaseTest : Test: results 
> from [21c2899018774688] to [2203180876471552], sum [1000 : 66000], cost: 
> [77ms]
> // Here scanner goes from region {111,222} to {222,333}. As you can see, the 
> scanner gets 2405 rows with stopRow '3373621463365126'.The scanner moves to 
> regin {333,444} too early and most data in {222,333} are skiped.
> 2018-10-17 21:25:58.321 INFO 213872 [ main] c.p.s.c.HBaseTest : Test: results 
> from [2203223414254308] to [3373621463365126], sum [2405 : 68405], cost: 
> [359ms]
> // Now the scanner is in region {333,444}, everything works just fine
> 2018-10-17 21:25:58.396 INFO 213872 [ main] c.p.s.c.HBaseTest : Test: results 
> from [3373764408525604] to [33b3849714659525], sum [1000 : 69405], cost: 
> [74ms]
> 2018-10-17 21:25:58.467 INFO 213872 [ main] c.p.s.c.HBaseTest : Test: results 
> from [33b3882378177107] to [33f5221377695765], sum [1000 : 70405], cost: 
> [71ms]
> ...{code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to