pddNick created HBASE-21332:
-------------------------------

             Summary: HBase scan with PageFilter cannot get all rows, non-edge 
region skiped
                 Key: HBASE-21332
                 URL: https://issues.apache.org/jira/browse/HBASE-21332
             Project: HBase
          Issue Type: Bug
          Components: regionserver, scan
    Affects Versions: 1.1.2
         Environment: * Server version:1.1.2.2.6.5.0-292, 
revision=897822d4dd5956ca186974c10382e9094683fa29
 * 2 region servers
 * 4 regions
 * HBase client:1.3.1

 
            Reporter: pddNick
         Attachments: HBaseTest.java, image-2018-10-17-21-14-25-354.png, 
image-2018-10-17-21-15-23-439.png

When using scan with pagefilter to get data from hbase, the scanner will 
skip{color:#FF0000} 'non-edge'{color} regions.The code i use comes from the 
book _HBase: Definitive Guide, Example 4.8, PageFilter example._ Difference is 
i use scan with startRow and stopRow.

Say i have regions with start and end keys like \{'111', '222', '333', '444'}, 
which means i have 3 regions \{111, 222}, \{222, 333}, \{333, 444} and they are 
in different region servers. When scan with startRow '111' and stopRow '444' , 
most data in region \{222, 333} will be skiped and won't be returned by 
ResultScanner.Region \{111,222} or \{333,444} works just fine and because they 
contain startRowkey or stopRowkey i call them non-edge regions.

Below is some explanation with log:

 
{code:java}
// Here scanner works just fine in region {111,222}, it gets exactly {pageSize} 
rows each time, which is 1000
...
2018-10-17 21:25:57.810 INFO 213872 [ main] c.p.s.c.HBaseTest : Test: results 
from [2139718600001069] to [2179067497952422], sum [1000 : 64000], cost: [77ms]
2018-10-17 21:25:57.885 INFO 213872 [ main] c.p.s.c.HBaseTest : Test: results 
from [2179098921079755] to [21c2879280113661], sum [1000 : 65000], cost: [75ms]
2018-10-17 21:25:57.962 INFO 213872 [ main] c.p.s.c.HBaseTest : Test: results 
from [21c2899018774688] to [2203180876471552], sum [1000 : 66000], cost: [77ms]

// Here scanner goes from region {111,222} to {222,333}. As you can see, the 
scanner gets 2405 rows with stopRow '3373621463365126'.The scanner moves to 
regin {333,444} too early and most data in {222,333} are skiped.
2018-10-17 21:25:58.321 INFO 213872 [ main] c.p.s.c.HBaseTest : Test: results 
from [2203223414254308] to [3373621463365126], sum [2405 : 68405], cost: [359ms]

// Now the scanner is in region {333,444}, everything works just fine
2018-10-17 21:25:58.396 INFO 213872 [ main] c.p.s.c.HBaseTest : Test: results 
from [3373764408525604] to [33b3849714659525], sum [1000 : 69405], cost: [74ms]
2018-10-17 21:25:58.467 INFO 213872 [ main] c.p.s.c.HBaseTest : Test: results 
from [33b3882378177107] to [33f5221377695765], sum [1000 : 70405], cost: [71ms]
...{code}
 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to