[jira] [Updated] (PHOENIX-7580) Data in last salt bucket is not being scanned for range scan

Sanjeet Malhotra (Jira) Thu, 10 Apr 2025 09:33:41 -0700


     [ 
https://issues.apache.org/jira/browse/PHOENIX-7580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Sanjeet Malhotra updated PHOENIX-7580:
--------------------------------------
    Description: 
Steps to reproduce:
 * Run DDL:
 ** CREATE TABLE IF NOT EXISTS TABLE1 (
    PK1 CHAR(7) NOT NULL,
    PK2 CHAR(7) NOT NULL,
    PK3 DECIMAL NOT NULL,
    PK4 CHAR(32) NOT NULL,
    COL1 VARCHAR,
    COL2 VARCHAR,
    CONSTRAINT PK PRIMARY KEY (
        PK1,
        PK2,
        PK3,
        PK4
    )
) VERSIONS=1, MULTI_TENANT=true, REPLICATION_SCOPE=0, SALT_BUCKETS=20, 
UPDATE_CACHE_FREQUENCY=172800000;
 * Add data to the table and make sure via HBase scan that some rows did went 
to last salt bucket.
 ** Make sure to add only that much data so, that no region split happens. You 
should have 20 regions as salt bucket count is 20.
 ** Add data such that first 3 PK columns have values: 'PK_VAL1', 'PK_VAL2' and 
1743478459000, for all the rows and only last PK column is different for each 
of the added rows.
 * Run range query:
 ** {{select count(\*) from TABLE1 where PK1 = 'PK_VAL1' AND PK3 = 
1743478459000 AND PK2 = 'PK_VAL2';}}
 ** Note down the count of rows returned by above query.
 * Run scan on HBase from shell:
 ** Sample scan for salt bucket `\x00`: `scan "TABLE1", \{VERSIONS => 1, 
COLUMNS => "0:_0", ROWPREFIXFILTER => "\x00PK_VAL1PK_VAL2\xC7\x02K#O.[\x00"}`
 ** Run the above scan for all the salt buckets from `\x00` to `\x13`. Note 
down the row count for each salt buckets. The sum should be same as what you 
got above from Phoenix query.
 * So, far we are good as Phoenix is able to scan rows of last salt bucket from 
HBase.
 * Now add 3 rows to second last salt bucket: `\x12`, such that row key prefix 
(constructed from first 3 PK columns) for these rows is greater than 
`\x12PK_VAL1PK_VAL2\xC7\x02K#O.[\x00`.
 * Out of the 3 new rows added use the second one (in lexicographic order) as 
split key for splitting the region corresponding to the second last salt 
bucket. Split the region corresponding to the second last salt bucket.
 * Now again run same Phoenix range query and you will observe that this time 
count of rows will be less than last time. And, diff. in count of rows will be 
same as no. of rows in last salt bucket (`\x13`).
 * So, the rows are there in HBase but Phoenix is not scanning them.

 

Root cause:
 * Please go through above steps to reproduce first to better understand the 
root cause.
 * For getting the region location, we are going through this code: 
[#https://github.com/apache/phoenix/blob/7a5965887f679bc229c1424e84951202a5ab27b7/phoenix-core-client/src/main/java/org/apache/phoenix/iterate/BaseResultIterators.java#L1048-L1064|#L1048-L1064].
 Here we get all the region locations for all the 20 regions as expected. So, 
no bug here.
 * In here: 
[#https://github.com/apache/phoenix/blob/7a5965887f679bc229c1424e84951202a5ab27b7/phoenix-core-client/src/main/java/org/apache/phoenix/iterate/BaseResultIterators.java#L1245|#L1245].
 We iterate over all the region locations we got, one by one and get scan for 
each region location.
 ** As you can see the end key of previous region becomes the start key to get 
the scan for next region 
([link|https://github.com/apache/phoenix/blob/7a5965887f679bc229c1424e84951202a5ab27b7/phoenix-core-client/src/main/java/org/apache/phoenix/iterate/BaseResultIterators.java#L1268]).
 And, end key for getting scan for last region is empty as per 
[this|https://github.com/apache/phoenix/blob/7a5965887f679bc229c1424e84951202a5ab27b7/phoenix-core-client/src/main/java/org/apache/phoenix/iterate/BaseResultIterators.java#L1156].
 ** So, when we are doing a range scan over a salted table then start key for 
the scan over last region will be end key of the region corresponding to the 
second last salt bucket.
 * Next, we call {{intersectScan}} 
([link|https://github.com/apache/phoenix/blob/7a5965887f679bc229c1424e84951202a5ab27b7/phoenix-core-client/src/main/java/org/apache/phoenix/iterate/BaseResultIterators.java#L1245])
 for getting scan for the last region.
 ** In {{intersectScan}} function def. 
([link|https://github.com/apache/phoenix/blob/7a5965887f679bc229c1424e84951202a5ab27b7/phoenix-core-client/src/main/java/org/apache/phoenix/compile/ScanRanges.java#L263]),
 the {{originalStartKey}} for last region is end key of region corresponding to 
the second last salt bucket and {{originalStopKey}} is empty byte array.
 ** Suppose following condition is satisfied:
 *** The region corresponding to the second last salt bucket has at least one 
region after it and belonging to the same second last bucket.
 ** So, this will make {{originalStartKey}} to have same first byte as second 
last salt bucket.
 ** Because of above, we will go in this if block 
([link|https://github.com/apache/phoenix/blob/7a5965887f679bc229c1424e84951202a5ab27b7/phoenix-core-client/src/main/java/org/apache/phoenix/compile/ScanRanges.java#L286])
 and it will be because {{Bytes.compareTo(originalStopKey, nextBucketStart) <= 
0}} got satisfied.
 ** Suppose following condition is satisfied: 
 *** Create a byte array from end key of region corresponding to second last 
salt bucket i.e. {{{}scanStartKey{}}}, by excluding the first byte. Let’s call 
it {{{}b1{}}}. 
 *** Create a byte array from row key prefix from WHERE clause of range scan 
excluding the first byte. Let’s call it {{{}b2{}}}.
 *** On doing byte comparison, {{b1}} > {{{}b2{}}}.
 ** Above condition will get us in this if block 
([link|https://github.com/apache/phoenix/blob/7a5965887f679bc229c1424e84951202a5ab27b7/phoenix-core-client/src/main/java/org/apache/phoenix/compile/ScanRanges.java#L385]).
 Important to note as this is range scan on salted table so, {{scanKeyOffset}} 
and thus, {{scanStartKeyOffset}} and {{scanStopKeyOffset}} both will be 1.
 ** Because of condition that {{b1}} > {{b2}} finally no scan is created for 
last salt bucket. And, we end up missing to scan last bucket.
 * Seems like the bug is:
 ** {{Bytes.compareTo(originalStopKey, nextBucketStart) <= 0}} 
([link|https://github.com/apache/phoenix/blob/7a5965887f679bc229c1424e84951202a5ab27b7/phoenix-core-client/src/main/java/org/apache/phoenix/compile/ScanRanges.java#L283]).
 This check succeeded in above described root cause analysis but it shouldn’t 
have ideally as {{originalStopKey}} was empty byte array and when a stop key is 
empty byte array then it needs to be handled as a special case that it means 
its the biggest possible value of the stop key.
 ** So, ideally we should not go into the above if block because of 
{{Bytes.compareTo(originalStopKey, nextBucketStart) <= 0}} check succeeding 
when {{originalStopKey}} is empty byte array. Rather we should first hit 
[this|https://github.com/apache/phoenix/blob/7a5965887f679bc229c1424e84951202a5ab27b7/phoenix-core-client/src/main/java/org/apache/phoenix/compile/ScanRanges.java#L294]
 line.
 ** Then in the next iteration of while loop we will go inside same if block 
but because of {{lastBucket}} boolean variable being true. And this time the 
first byte of {{wrkStartKey}} and {{nextByteBucket}} will be same. So, when 
doing range scan over a salted table if we are hitting 
[this|https://github.com/apache/phoenix/blob/7a5965887f679bc229c1424e84951202a5ab27b7/phoenix-core-client/src/main/java/org/apache/phoenix/compile/ScanRanges.java#L286]
 line such that {{wrkStartKey}} and {{originalStopKey}} both belong to 
different salt buckets then its wrong. 

 

  was:
Steps to reproduce:
 * Run DDL:
 ** CREATE TABLE IF NOT EXISTS TABLE1 (
    PK1 CHAR(7) NOT NULL,
    PK2 CHAR(7) NOT NULL,
    PK3 DECIMAL NOT NULL,
    PK4 CHAR(32) NOT NULL,
    COL1 VARCHAR,
    COL2 VARCHAR,
    CONSTRAINT PK PRIMARY KEY (
        PK1,
        PK2,
        PK3,
        PK4
    )
) VERSIONS=1, MULTI_TENANT=true, REPLICATION_SCOPE=0, SALT_BUCKETS=20, 
UPDATE_CACHE_FREQUENCY=172800000;
 * Add data to the table and make sure via HBase scan that some rows did went 
to last salt bucket.
 ** Make sure to add only that much data so, that no region split happens. You 
should have 20 regions as salt bucket count is 20.
 ** Add data such that first 3 PK columns have values: 'PK_VAL1', 'PK_VAL2' and 
1743478459000, for all the rows and only last PK column is different for each 
of the added rows.
 * Run range query:
 ** {{select count(*) from TABLE1 where PK1 = 'PK_VAL1' AND PK3 = 1743478459000 
AND PK2 = 'PK_VAL2';}}
 ** Note down the count of rows returned by above query.
 * Run scan on HBase from shell:
 ** Sample scan for salt bucket `\x00`: `scan "TABLE1", \{VERSIONS => 1, 
COLUMNS => "0:_0", ROWPREFIXFILTER => "\x00PK_VAL1PK_VAL2\xC7\x02K#O.[\x00"}`
 ** Run the above scan for all the salt buckets from `\x00` to `\x13`. Note 
down the row count for each salt buckets. The sum should be same as what you 
got above from Phoenix query.
 * So, far we are good as Phoenix is able to scan rows of last salt bucket from 
HBase.
 * Now add 3 rows to second last salt bucket: `\x12`, such that row key prefix 
(constructed from first 3 PK columns) for these rows is greater than 
`\x12PK_VAL1PK_VAL2\xC7\x02K#O.[\x00`.
 * Out of the 3 new rows added use the second one (in lexicographic order) as 
split key for splitting the region corresponding to the second last salt 
bucket. Split the region corresponding to the second last salt bucket.
 * Now again run same Phoenix range query and you will observe that this time 
count of rows will be less than last time. And, diff. in count of rows will be 
same as no. of rows in last salt bucket (`\x13`).
 * So, the rows are there in HBase but Phoenix is not scanning them.

 

Root cause:
 * Please go through above steps to reproduce first to better understand the 
root cause.
 * For getting the region location, we are going through this code: 
[#https://github.com/apache/phoenix/blob/7a5965887f679bc229c1424e84951202a5ab27b7/phoenix-core-client/src/main/java/org/apache/phoenix/iterate/BaseResultIterators.java#L1048-L1064|#L1048-L1064].
 Here we get all the region locations for all the 20 regions as expected. So, 
no bug here.
 * In here: 
[#https://github.com/apache/phoenix/blob/7a5965887f679bc229c1424e84951202a5ab27b7/phoenix-core-client/src/main/java/org/apache/phoenix/iterate/BaseResultIterators.java#L1245|#L1245].
 We iterate over all the region locations we got, one by one and get scan for 
each region location.
 ** As you can see the end key of previous region becomes the start key to get 
the scan for next region 
([link|https://github.com/apache/phoenix/blob/7a5965887f679bc229c1424e84951202a5ab27b7/phoenix-core-client/src/main/java/org/apache/phoenix/iterate/BaseResultIterators.java#L1268]).
 And, end key for getting scan for last region is empty as per 
[this|https://github.com/apache/phoenix/blob/7a5965887f679bc229c1424e84951202a5ab27b7/phoenix-core-client/src/main/java/org/apache/phoenix/iterate/BaseResultIterators.java#L1156].
 ** So, when we are doing a range scan over a salted table then start key for 
the scan over last region will be end key of the region corresponding to the 
second last salt bucket.
 * Next, we call {{intersectScan}} 
([link|https://github.com/apache/phoenix/blob/7a5965887f679bc229c1424e84951202a5ab27b7/phoenix-core-client/src/main/java/org/apache/phoenix/iterate/BaseResultIterators.java#L1245])
 for getting scan for the last region.
 ** In {{intersectScan}} function def. 
([link|https://github.com/apache/phoenix/blob/7a5965887f679bc229c1424e84951202a5ab27b7/phoenix-core-client/src/main/java/org/apache/phoenix/compile/ScanRanges.java#L263]),
 the {{originalStartKey}} for last region is end key of region corresponding to 
the second last salt bucket and {{originalStopKey}} is empty byte array.
 ** Suppose following condition is satisfied:
 *** The region corresponding to the second last salt bucket has at least one 
region after it and belonging to the same second last bucket.
 ** So, this will make {{originalStartKey}} to have same first byte as second 
last salt bucket.
 ** Because of above, we will go in this if block 
([link|https://github.com/apache/phoenix/blob/7a5965887f679bc229c1424e84951202a5ab27b7/phoenix-core-client/src/main/java/org/apache/phoenix/compile/ScanRanges.java#L286])
 and it will be because {{Bytes.compareTo(originalStopKey, nextBucketStart) <= 
0}} got satisfied.
 ** Suppose following condition is satisfied: 
 *** Create a byte array from end key of region corresponding to second last 
salt bucket i.e. {{{}scanStartKey{}}}, by excluding the first byte. Let’s call 
it {{{}b1{}}}. 
 *** Create a byte array from row key prefix from WHERE clause of range scan 
excluding the first byte. Let’s call it {{{}b2{}}}.
 *** On doing byte comparison, {{b1}} > {{{}b2{}}}.
 ** Above condition will get us in this if block 
([link|https://github.com/apache/phoenix/blob/7a5965887f679bc229c1424e84951202a5ab27b7/phoenix-core-client/src/main/java/org/apache/phoenix/compile/ScanRanges.java#L385]).
 Important to note as this is range scan on salted table so, {{scanKeyOffset}} 
and thus, {{scanStartKeyOffset}} and {{scanStopKeyOffset}} both will be 1.
 ** Because of condition that {{b1}} > {{b2}} finally no scan is created for 
last salt bucket. And, we end up missing to scan last bucket.
 * Seems like the bug is:
 ** {{Bytes.compareTo(originalStopKey, nextBucketStart) <= 0}} 
([link|https://github.com/apache/phoenix/blob/7a5965887f679bc229c1424e84951202a5ab27b7/phoenix-core-client/src/main/java/org/apache/phoenix/compile/ScanRanges.java#L283]).
 This check succeeded in above described root cause analysis but it shouldn’t 
have ideally as {{originalStopKey}} was empty byte array and when a stop key is 
empty byte array then it needs to be handled as a special case that it means 
its the biggest possible value of the stop key.
 ** So, ideally we should not go into the above if block because of 
{{Bytes.compareTo(originalStopKey, nextBucketStart) <= 0}} check succeeding 
when {{originalStopKey}} is empty byte array. Rather we should first hit 
[this|https://github.com/apache/phoenix/blob/7a5965887f679bc229c1424e84951202a5ab27b7/phoenix-core-client/src/main/java/org/apache/phoenix/compile/ScanRanges.java#L294]
 line.
 ** Then in the next iteration of while loop we will go inside same if block 
but because of {{lastBucket}} boolean variable being true. And this time the 
first byte of {{wrkStartKey}} and {{nextByteBucket}} will be same. So, when 
doing range scan over a salted table if we are hitting 
[this|https://github.com/apache/phoenix/blob/7a5965887f679bc229c1424e84951202a5ab27b7/phoenix-core-client/src/main/java/org/apache/phoenix/compile/ScanRanges.java#L286]
 line such that {{wrkStartKey}} and {{originalStopKey}} both belong to 
different salt buckets then its wrong. 

 


> Data in last salt bucket is not being scanned for range scan
> ------------------------------------------------------------
>
>                 Key: PHOENIX-7580
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-7580
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: Sanjeet Malhotra
>            Assignee: Sanjeet Malhotra
>            Priority: Major
>
> Steps to reproduce:
>  * Run DDL:
>  ** CREATE TABLE IF NOT EXISTS TABLE1 (
>     PK1 CHAR(7) NOT NULL,
>     PK2 CHAR(7) NOT NULL,
>     PK3 DECIMAL NOT NULL,
>     PK4 CHAR(32) NOT NULL,
>     COL1 VARCHAR,
>     COL2 VARCHAR,
>     CONSTRAINT PK PRIMARY KEY (
>         PK1,
>         PK2,
>         PK3,
>         PK4
>     )
> ) VERSIONS=1, MULTI_TENANT=true, REPLICATION_SCOPE=0, SALT_BUCKETS=20, 
> UPDATE_CACHE_FREQUENCY=172800000;
>  * Add data to the table and make sure via HBase scan that some rows did went 
> to last salt bucket.
>  ** Make sure to add only that much data so, that no region split happens. 
> You should have 20 regions as salt bucket count is 20.
>  ** Add data such that first 3 PK columns have values: 'PK_VAL1', 'PK_VAL2' 
> and 1743478459000, for all the rows and only last PK column is different for 
> each of the added rows.
>  * Run range query:
>  ** {{select count(\*) from TABLE1 where PK1 = 'PK_VAL1' AND PK3 = 
> 1743478459000 AND PK2 = 'PK_VAL2';}}
>  ** Note down the count of rows returned by above query.
>  * Run scan on HBase from shell:
>  ** Sample scan for salt bucket `\x00`: `scan "TABLE1", \{VERSIONS => 1, 
> COLUMNS => "0:_0", ROWPREFIXFILTER => "\x00PK_VAL1PK_VAL2\xC7\x02K#O.[\x00"}`
>  ** Run the above scan for all the salt buckets from `\x00` to `\x13`. Note 
> down the row count for each salt buckets. The sum should be same as what you 
> got above from Phoenix query.
>  * So, far we are good as Phoenix is able to scan rows of last salt bucket 
> from HBase.
>  * Now add 3 rows to second last salt bucket: `\x12`, such that row key 
> prefix (constructed from first 3 PK columns) for these rows is greater than 
> `\x12PK_VAL1PK_VAL2\xC7\x02K#O.[\x00`.
>  * Out of the 3 new rows added use the second one (in lexicographic order) as 
> split key for splitting the region corresponding to the second last salt 
> bucket. Split the region corresponding to the second last salt bucket.
>  * Now again run same Phoenix range query and you will observe that this time 
> count of rows will be less than last time. And, diff. in count of rows will 
> be same as no. of rows in last salt bucket (`\x13`).
>  * So, the rows are there in HBase but Phoenix is not scanning them.
>  
> Root cause:
>  * Please go through above steps to reproduce first to better understand the 
> root cause.
>  * For getting the region location, we are going through this code: 
> [#https://github.com/apache/phoenix/blob/7a5965887f679bc229c1424e84951202a5ab27b7/phoenix-core-client/src/main/java/org/apache/phoenix/iterate/BaseResultIterators.java#L1048-L1064|#L1048-L1064].
>  Here we get all the region locations for all the 20 regions as expected. So, 
> no bug here.
>  * In here: 
> [#https://github.com/apache/phoenix/blob/7a5965887f679bc229c1424e84951202a5ab27b7/phoenix-core-client/src/main/java/org/apache/phoenix/iterate/BaseResultIterators.java#L1245|#L1245].
>  We iterate over all the region locations we got, one by one and get scan for 
> each region location.
>  ** As you can see the end key of previous region becomes the start key to 
> get the scan for next region 
> ([link|https://github.com/apache/phoenix/blob/7a5965887f679bc229c1424e84951202a5ab27b7/phoenix-core-client/src/main/java/org/apache/phoenix/iterate/BaseResultIterators.java#L1268]).
>  And, end key for getting scan for last region is empty as per 
> [this|https://github.com/apache/phoenix/blob/7a5965887f679bc229c1424e84951202a5ab27b7/phoenix-core-client/src/main/java/org/apache/phoenix/iterate/BaseResultIterators.java#L1156].
>  ** So, when we are doing a range scan over a salted table then start key for 
> the scan over last region will be end key of the region corresponding to the 
> second last salt bucket.
>  * Next, we call {{intersectScan}} 
> ([link|https://github.com/apache/phoenix/blob/7a5965887f679bc229c1424e84951202a5ab27b7/phoenix-core-client/src/main/java/org/apache/phoenix/iterate/BaseResultIterators.java#L1245])
>  for getting scan for the last region.
>  ** In {{intersectScan}} function def. 
> ([link|https://github.com/apache/phoenix/blob/7a5965887f679bc229c1424e84951202a5ab27b7/phoenix-core-client/src/main/java/org/apache/phoenix/compile/ScanRanges.java#L263]),
>  the {{originalStartKey}} for last region is end key of region corresponding 
> to the second last salt bucket and {{originalStopKey}} is empty byte array.
>  ** Suppose following condition is satisfied:
>  *** The region corresponding to the second last salt bucket has at least one 
> region after it and belonging to the same second last bucket.
>  ** So, this will make {{originalStartKey}} to have same first byte as second 
> last salt bucket.
>  ** Because of above, we will go in this if block 
> ([link|https://github.com/apache/phoenix/blob/7a5965887f679bc229c1424e84951202a5ab27b7/phoenix-core-client/src/main/java/org/apache/phoenix/compile/ScanRanges.java#L286])
>  and it will be because {{Bytes.compareTo(originalStopKey, nextBucketStart) 
> <= 0}} got satisfied.
>  ** Suppose following condition is satisfied: 
>  *** Create a byte array from end key of region corresponding to second last 
> salt bucket i.e. {{{}scanStartKey{}}}, by excluding the first byte. Let’s 
> call it {{{}b1{}}}. 
>  *** Create a byte array from row key prefix from WHERE clause of range scan 
> excluding the first byte. Let’s call it {{{}b2{}}}.
>  *** On doing byte comparison, {{b1}} > {{{}b2{}}}.
>  ** Above condition will get us in this if block 
> ([link|https://github.com/apache/phoenix/blob/7a5965887f679bc229c1424e84951202a5ab27b7/phoenix-core-client/src/main/java/org/apache/phoenix/compile/ScanRanges.java#L385]).
>  Important to note as this is range scan on salted table so, 
> {{scanKeyOffset}} and thus, {{scanStartKeyOffset}} and {{scanStopKeyOffset}} 
> both will be 1.
>  ** Because of condition that {{b1}} > {{b2}} finally no scan is created for 
> last salt bucket. And, we end up missing to scan last bucket.
>  * Seems like the bug is:
>  ** {{Bytes.compareTo(originalStopKey, nextBucketStart) <= 0}} 
> ([link|https://github.com/apache/phoenix/blob/7a5965887f679bc229c1424e84951202a5ab27b7/phoenix-core-client/src/main/java/org/apache/phoenix/compile/ScanRanges.java#L283]).
>  This check succeeded in above described root cause analysis but it shouldn’t 
> have ideally as {{originalStopKey}} was empty byte array and when a stop key 
> is empty byte array then it needs to be handled as a special case that it 
> means its the biggest possible value of the stop key.
>  ** So, ideally we should not go into the above if block because of 
> {{Bytes.compareTo(originalStopKey, nextBucketStart) <= 0}} check succeeding 
> when {{originalStopKey}} is empty byte array. Rather we should first hit 
> [this|https://github.com/apache/phoenix/blob/7a5965887f679bc229c1424e84951202a5ab27b7/phoenix-core-client/src/main/java/org/apache/phoenix/compile/ScanRanges.java#L294]
>  line.
>  ** Then in the next iteration of while loop we will go inside same if block 
> but because of {{lastBucket}} boolean variable being true. And this time the 
> first byte of {{wrkStartKey}} and {{nextByteBucket}} will be same. So, when 
> doing range scan over a salted table if we are hitting 
> [this|https://github.com/apache/phoenix/blob/7a5965887f679bc229c1424e84951202a5ab27b7/phoenix-core-client/src/main/java/org/apache/phoenix/compile/ScanRanges.java#L286]
>  line such that {{wrkStartKey}} and {{originalStopKey}} both belong to 
> different salt buckets then its wrong. 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (PHOENIX-7580) Data in last salt bucket is not being scanned for range scan

Reply via email to