[ 
https://issues.apache.org/jira/browse/PHOENIX-7580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Viraj Jasani updated PHOENIX-7580:
----------------------------------
    Fix Version/s: 5.1.4
                   5.2.2

> Data in last salt bucket is not being scanned for range scan
> ------------------------------------------------------------
>
>                 Key: PHOENIX-7580
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-7580
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: Sanjeet Malhotra
>            Assignee: Sanjeet Malhotra
>            Priority: Major
>             Fix For: 5.3.0, 5.1.4, 5.2.2
>
>
> Steps to reproduce:
>  * Run DDL:
>  ** CREATE TABLE IF NOT EXISTS TABLE1 (
>     PK1 CHAR(7) NOT NULL,
>     PK2 CHAR(7) NOT NULL,
>     PK3 DECIMAL NOT NULL,
>     PK4 CHAR(32) NOT NULL,
>     COL1 VARCHAR,
>     COL2 VARCHAR,
>     CONSTRAINT PK PRIMARY KEY (
>         PK1,
>         PK2,
>         PK3,
>         PK4
>     )
> ) VERSIONS=1, MULTI_TENANT=true, REPLICATION_SCOPE=0, SALT_BUCKETS=20, 
> UPDATE_CACHE_FREQUENCY=172800000;
>  * Add data to the table and make sure via HBase scan that some rows did went 
> to last salt bucket.
>  ** Make sure to add only that much data so, that no region split happens. 
> You should have 20 regions as salt bucket count is 20.
>  ** Add data such that first 3 PK columns have values: 'PK_VAL1', 'PK_VAL2' 
> and 1743478459000, for all the rows and only last PK column is different for 
> each of the added rows.
>  * Run range query:
>  ** {{select count(\*) from TABLE1 where PK1 = 'PK_VAL1' AND PK3 = 
> 1743478459000 AND PK2 = 'PK_VAL2';}}
>  ** Note down the count of rows returned by above query.
>  * Run scan on HBase from shell:
>  ** Sample scan for salt bucket `\x00`: `scan "TABLE1", \{VERSIONS => 1, 
> COLUMNS => "0:_0", ROWPREFIXFILTER => "\x00PK_VAL1PK_VAL2\xC7\x02K#O.[\x00"}`
>  ** Run the above scan for all the salt buckets from `\x00` to `\x13`. Note 
> down the row count for each salt buckets. The sum should be same as what you 
> got above from Phoenix query.
>  * So, far we are good as Phoenix is able to scan rows of last salt bucket 
> from HBase.
>  * Now add 3 rows to second last salt bucket: `\x12`, such that row key 
> prefix (constructed from first 3 PK columns) for these rows is greater than 
> `\x12PK_VAL1PK_VAL2\xC7\x02K#O.[\x00`.
>  * Out of the 3 new rows added use the second one (in lexicographic order) as 
> split key for splitting the region corresponding to the second last salt 
> bucket. Split the region corresponding to the second last salt bucket.
>  * Now again run same Phoenix range query and you will observe that this time 
> count of rows will be less than last time. And, diff. in count of rows will 
> be same as no. of rows in last salt bucket (`\x13`).
>  * So, the rows are there in HBase but Phoenix is not scanning them.
>  
> Root cause:
>  * Please go through above steps to reproduce first to better understand the 
> root cause.
>  * For getting the region location, we are going through this code: 
> [#https://github.com/apache/phoenix/blob/7a5965887f679bc229c1424e84951202a5ab27b7/phoenix-core-client/src/main/java/org/apache/phoenix/iterate/BaseResultIterators.java#L1048-L1064|#L1048-L1064].
>  Here we get all the region locations for all the 20 regions as expected. So, 
> no bug here.
>  * In here: 
> [#https://github.com/apache/phoenix/blob/7a5965887f679bc229c1424e84951202a5ab27b7/phoenix-core-client/src/main/java/org/apache/phoenix/iterate/BaseResultIterators.java#L1245|#L1245].
>  We iterate over all the region locations we got, one by one and get scan for 
> each region location.
>  ** As you can see the end key of previous region becomes the start key to 
> get the scan for next region 
> ([link|https://github.com/apache/phoenix/blob/7a5965887f679bc229c1424e84951202a5ab27b7/phoenix-core-client/src/main/java/org/apache/phoenix/iterate/BaseResultIterators.java#L1268]).
>  And, end key for getting scan for last region is empty as per 
> [this|https://github.com/apache/phoenix/blob/7a5965887f679bc229c1424e84951202a5ab27b7/phoenix-core-client/src/main/java/org/apache/phoenix/iterate/BaseResultIterators.java#L1156].
>  ** So, when we are doing a range scan over a salted table then start key for 
> the scan over last region will be end key of the region corresponding to the 
> second last salt bucket.
>  * Next, we call {{intersectScan}} 
> ([link|https://github.com/apache/phoenix/blob/7a5965887f679bc229c1424e84951202a5ab27b7/phoenix-core-client/src/main/java/org/apache/phoenix/iterate/BaseResultIterators.java#L1245])
>  for getting scan for the last region.
>  ** In {{intersectScan}} function def. 
> ([link|https://github.com/apache/phoenix/blob/7a5965887f679bc229c1424e84951202a5ab27b7/phoenix-core-client/src/main/java/org/apache/phoenix/compile/ScanRanges.java#L263]),
>  the {{originalStartKey}} for last region is end key of region corresponding 
> to the second last salt bucket and {{originalStopKey}} is empty byte array.
>  ** Suppose following condition is satisfied:
>  *** The region corresponding to the second last salt bucket has at least one 
> region after it and belonging to the same second last bucket.
>  ** So, this will make {{originalStartKey}} to have same first byte as second 
> last salt bucket.
>  ** Because of above, we will go in this if block 
> ([link|https://github.com/apache/phoenix/blob/7a5965887f679bc229c1424e84951202a5ab27b7/phoenix-core-client/src/main/java/org/apache/phoenix/compile/ScanRanges.java#L286])
>  and it will be because {{Bytes.compareTo(originalStopKey, nextBucketStart) 
> <= 0}} got satisfied.
>  ** Suppose following condition is satisfied: 
>  *** Create a byte array from end key of region corresponding to second last 
> salt bucket i.e. {{{}scanStartKey{}}}, by excluding the first byte. Let’s 
> call it {{{}b1{}}}. 
>  *** Create a byte array from row key prefix from WHERE clause of range scan 
> excluding the first byte. Let’s call it {{{}b2{}}}.
>  *** On doing byte comparison, {{b1}} > {{{}b2{}}}.
>  ** Above condition will get us in this if block 
> ([link|https://github.com/apache/phoenix/blob/7a5965887f679bc229c1424e84951202a5ab27b7/phoenix-core-client/src/main/java/org/apache/phoenix/compile/ScanRanges.java#L385]).
>  Important to note as this is range scan on salted table so, 
> {{scanKeyOffset}} and thus, {{scanStartKeyOffset}} and {{scanStopKeyOffset}} 
> both will be 1.
>  ** Because of condition that {{b1}} > {{b2}} finally no scan is created for 
> last salt bucket. And, we end up missing to scan last bucket.
>  * Seems like the bug is:
>  ** {{Bytes.compareTo(originalStopKey, nextBucketStart) <= 0}} 
> ([link|https://github.com/apache/phoenix/blob/7a5965887f679bc229c1424e84951202a5ab27b7/phoenix-core-client/src/main/java/org/apache/phoenix/compile/ScanRanges.java#L283]).
>  This check succeeded in above described root cause analysis but it shouldn’t 
> have ideally as {{originalStopKey}} was empty byte array and when a stop key 
> is empty byte array then it needs to be handled as a special case that it 
> means its the biggest possible value of the stop key.
>  ** So, ideally we should not go into the above if block because of 
> {{Bytes.compareTo(originalStopKey, nextBucketStart) <= 0}} check succeeding 
> when {{originalStopKey}} is empty byte array. Rather we should first hit 
> [this|https://github.com/apache/phoenix/blob/7a5965887f679bc229c1424e84951202a5ab27b7/phoenix-core-client/src/main/java/org/apache/phoenix/compile/ScanRanges.java#L294]
>  line.
>  ** Then in the next iteration of while loop we will go inside same if block 
> but because of {{lastBucket}} boolean variable being true. And this time the 
> first byte of {{wrkStartKey}} and {{nextByteBucket}} will be same. So, when 
> doing range scan over a salted table if we are hitting 
> [this|https://github.com/apache/phoenix/blob/7a5965887f679bc229c1424e84951202a5ab27b7/phoenix-core-client/src/main/java/org/apache/phoenix/compile/ScanRanges.java#L286]
>  line such that {{wrkStartKey}} and {{originalStopKey}} both belong to 
> different salt buckets then its wrong. 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to