[ https://issues.apache.org/jira/browse/HBASE-21332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16662030#comment-16662030 ]
pddNick commented on HBASE-21332: --------------------------------- By parallel i mean in server-end, scan will be done by multi StoreFileScanner and MemstoreScanner in parallel.Looks like i have messed up with server-end scan and client-end ResultScanner :(:(:( BTW there is no _scan.setLimit_ method in hbase v1.1.2.With map-reduce not allowded, there seems no better way than PageFilter if i need to do a large scale scan or full table scan. > HBase scan with PageFilter cannot get all rows, non-edge region skiped > ---------------------------------------------------------------------- > > Key: HBASE-21332 > URL: https://issues.apache.org/jira/browse/HBASE-21332 > Project: HBase > Issue Type: Bug > Components: regionserver, scan > Affects Versions: 1.1.2 > Environment: * Server version:1.1.2.2.6.5.0-292, > revision=897822d4dd5956ca186974c10382e9094683fa29 > * 2 region servers > * 4 regions > * HBase client:1.3.1 > > Reporter: pddNick > Assignee: Zheng Hu > Priority: Minor > Attachments: HBaseTest.java, image-2018-10-17-21-14-25-354.png, > image-2018-10-17-21-15-23-439.png, image-2018-10-23-17-37-22-028.png > > > When using scan with pagefilter to get data from hbase, the scanner will > skip{color:#ff0000} 'non-edge'{color} regions.The code i use comes from the > book _HBase: Definitive Guide, Example 4.8, PageFilter example._ Difference > is i use scan with startRow and stopRow. > Say i have regions with start and end keys like \{'111', '222', '333', > '444'}, which means i have 3 regions \{111, 222}, \{222, 333}, \{333, 444} > and they are in different region servers. When scan with startRow '111' and > stopRow '444' , most data in region \{222, 333} will be skiped and won't be > returned by ResultScanner.Region \{111,222} or \{333,444} works just fine and > because region \{222,333} doesn't contain startRowkey or stopRowkey i call it > non-edge region. > Below is some explanation with log: > > {code:java} > // Here scanner works just fine in region {111,222}, it gets exactly > {pageSize} rows each time, which is 1000 > ... > 2018-10-17 21:25:57.810 INFO 213872 [ main] c.p.s.c.HBaseTest : Test: results > from [2139718600001069] to [2179067497952422], sum [1000 : 64000], cost: > [77ms] > 2018-10-17 21:25:57.885 INFO 213872 [ main] c.p.s.c.HBaseTest : Test: results > from [2179098921079755] to [21c2879280113661], sum [1000 : 65000], cost: > [75ms] > 2018-10-17 21:25:57.962 INFO 213872 [ main] c.p.s.c.HBaseTest : Test: results > from [21c2899018774688] to [2203180876471552], sum [1000 : 66000], cost: > [77ms] > // Here scanner goes from region {111,222} to {222,333}. As you can see, the > scanner gets 2405 rows with stopRow '3373621463365126'.The scanner moves to > regin {333,444} too early and most data in {222,333} are skiped. > 2018-10-17 21:25:58.321 INFO 213872 [ main] c.p.s.c.HBaseTest : Test: results > from [2203223414254308] to [3373621463365126], sum [2405 : 68405], cost: > [359ms] > // Now the scanner is in region {333,444}, everything works just fine > 2018-10-17 21:25:58.396 INFO 213872 [ main] c.p.s.c.HBaseTest : Test: results > from [3373764408525604] to [33b3849714659525], sum [1000 : 69405], cost: > [74ms] > 2018-10-17 21:25:58.467 INFO 213872 [ main] c.p.s.c.HBaseTest : Test: results > from [33b3882378177107] to [33f5221377695765], sum [1000 : 70405], cost: > [71ms] > ...{code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)