[ https://issues.apache.org/jira/browse/PHOENIX-7580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Viraj Jasani updated PHOENIX-7580: ---------------------------------- Fix Version/s: 5.1.4 5.2.2 > Data in last salt bucket is not being scanned for range scan > ------------------------------------------------------------ > > Key: PHOENIX-7580 > URL: https://issues.apache.org/jira/browse/PHOENIX-7580 > Project: Phoenix > Issue Type: Bug > Reporter: Sanjeet Malhotra > Assignee: Sanjeet Malhotra > Priority: Major > Fix For: 5.3.0, 5.1.4, 5.2.2 > > > Steps to reproduce: > * Run DDL: > ** CREATE TABLE IF NOT EXISTS TABLE1 ( > PK1 CHAR(7) NOT NULL, > PK2 CHAR(7) NOT NULL, > PK3 DECIMAL NOT NULL, > PK4 CHAR(32) NOT NULL, > COL1 VARCHAR, > COL2 VARCHAR, > CONSTRAINT PK PRIMARY KEY ( > PK1, > PK2, > PK3, > PK4 > ) > ) VERSIONS=1, MULTI_TENANT=true, REPLICATION_SCOPE=0, SALT_BUCKETS=20, > UPDATE_CACHE_FREQUENCY=172800000; > * Add data to the table and make sure via HBase scan that some rows did went > to last salt bucket. > ** Make sure to add only that much data so, that no region split happens. > You should have 20 regions as salt bucket count is 20. > ** Add data such that first 3 PK columns have values: 'PK_VAL1', 'PK_VAL2' > and 1743478459000, for all the rows and only last PK column is different for > each of the added rows. > * Run range query: > ** {{select count(\*) from TABLE1 where PK1 = 'PK_VAL1' AND PK3 = > 1743478459000 AND PK2 = 'PK_VAL2';}} > ** Note down the count of rows returned by above query. > * Run scan on HBase from shell: > ** Sample scan for salt bucket `\x00`: `scan "TABLE1", \{VERSIONS => 1, > COLUMNS => "0:_0", ROWPREFIXFILTER => "\x00PK_VAL1PK_VAL2\xC7\x02K#O.[\x00"}` > ** Run the above scan for all the salt buckets from `\x00` to `\x13`. Note > down the row count for each salt buckets. The sum should be same as what you > got above from Phoenix query. > * So, far we are good as Phoenix is able to scan rows of last salt bucket > from HBase. > * Now add 3 rows to second last salt bucket: `\x12`, such that row key > prefix (constructed from first 3 PK columns) for these rows is greater than > `\x12PK_VAL1PK_VAL2\xC7\x02K#O.[\x00`. > * Out of the 3 new rows added use the second one (in lexicographic order) as > split key for splitting the region corresponding to the second last salt > bucket. Split the region corresponding to the second last salt bucket. > * Now again run same Phoenix range query and you will observe that this time > count of rows will be less than last time. And, diff. in count of rows will > be same as no. of rows in last salt bucket (`\x13`). > * So, the rows are there in HBase but Phoenix is not scanning them. > > Root cause: > * Please go through above steps to reproduce first to better understand the > root cause. > * For getting the region location, we are going through this code: > [#https://github.com/apache/phoenix/blob/7a5965887f679bc229c1424e84951202a5ab27b7/phoenix-core-client/src/main/java/org/apache/phoenix/iterate/BaseResultIterators.java#L1048-L1064|#L1048-L1064]. > Here we get all the region locations for all the 20 regions as expected. So, > no bug here. > * In here: > [#https://github.com/apache/phoenix/blob/7a5965887f679bc229c1424e84951202a5ab27b7/phoenix-core-client/src/main/java/org/apache/phoenix/iterate/BaseResultIterators.java#L1245|#L1245]. > We iterate over all the region locations we got, one by one and get scan for > each region location. > ** As you can see the end key of previous region becomes the start key to > get the scan for next region > ([link|https://github.com/apache/phoenix/blob/7a5965887f679bc229c1424e84951202a5ab27b7/phoenix-core-client/src/main/java/org/apache/phoenix/iterate/BaseResultIterators.java#L1268]). > And, end key for getting scan for last region is empty as per > [this|https://github.com/apache/phoenix/blob/7a5965887f679bc229c1424e84951202a5ab27b7/phoenix-core-client/src/main/java/org/apache/phoenix/iterate/BaseResultIterators.java#L1156]. > ** So, when we are doing a range scan over a salted table then start key for > the scan over last region will be end key of the region corresponding to the > second last salt bucket. > * Next, we call {{intersectScan}} > ([link|https://github.com/apache/phoenix/blob/7a5965887f679bc229c1424e84951202a5ab27b7/phoenix-core-client/src/main/java/org/apache/phoenix/iterate/BaseResultIterators.java#L1245]) > for getting scan for the last region. > ** In {{intersectScan}} function def. > ([link|https://github.com/apache/phoenix/blob/7a5965887f679bc229c1424e84951202a5ab27b7/phoenix-core-client/src/main/java/org/apache/phoenix/compile/ScanRanges.java#L263]), > the {{originalStartKey}} for last region is end key of region corresponding > to the second last salt bucket and {{originalStopKey}} is empty byte array. > ** Suppose following condition is satisfied: > *** The region corresponding to the second last salt bucket has at least one > region after it and belonging to the same second last bucket. > ** So, this will make {{originalStartKey}} to have same first byte as second > last salt bucket. > ** Because of above, we will go in this if block > ([link|https://github.com/apache/phoenix/blob/7a5965887f679bc229c1424e84951202a5ab27b7/phoenix-core-client/src/main/java/org/apache/phoenix/compile/ScanRanges.java#L286]) > and it will be because {{Bytes.compareTo(originalStopKey, nextBucketStart) > <= 0}} got satisfied. > ** Suppose following condition is satisfied: > *** Create a byte array from end key of region corresponding to second last > salt bucket i.e. {{{}scanStartKey{}}}, by excluding the first byte. Let’s > call it {{{}b1{}}}. > *** Create a byte array from row key prefix from WHERE clause of range scan > excluding the first byte. Let’s call it {{{}b2{}}}. > *** On doing byte comparison, {{b1}} > {{{}b2{}}}. > ** Above condition will get us in this if block > ([link|https://github.com/apache/phoenix/blob/7a5965887f679bc229c1424e84951202a5ab27b7/phoenix-core-client/src/main/java/org/apache/phoenix/compile/ScanRanges.java#L385]). > Important to note as this is range scan on salted table so, > {{scanKeyOffset}} and thus, {{scanStartKeyOffset}} and {{scanStopKeyOffset}} > both will be 1. > ** Because of condition that {{b1}} > {{b2}} finally no scan is created for > last salt bucket. And, we end up missing to scan last bucket. > * Seems like the bug is: > ** {{Bytes.compareTo(originalStopKey, nextBucketStart) <= 0}} > ([link|https://github.com/apache/phoenix/blob/7a5965887f679bc229c1424e84951202a5ab27b7/phoenix-core-client/src/main/java/org/apache/phoenix/compile/ScanRanges.java#L283]). > This check succeeded in above described root cause analysis but it shouldn’t > have ideally as {{originalStopKey}} was empty byte array and when a stop key > is empty byte array then it needs to be handled as a special case that it > means its the biggest possible value of the stop key. > ** So, ideally we should not go into the above if block because of > {{Bytes.compareTo(originalStopKey, nextBucketStart) <= 0}} check succeeding > when {{originalStopKey}} is empty byte array. Rather we should first hit > [this|https://github.com/apache/phoenix/blob/7a5965887f679bc229c1424e84951202a5ab27b7/phoenix-core-client/src/main/java/org/apache/phoenix/compile/ScanRanges.java#L294] > line. > ** Then in the next iteration of while loop we will go inside same if block > but because of {{lastBucket}} boolean variable being true. And this time the > first byte of {{wrkStartKey}} and {{nextByteBucket}} will be same. So, when > doing range scan over a salted table if we are hitting > [this|https://github.com/apache/phoenix/blob/7a5965887f679bc229c1424e84951202a5ab27b7/phoenix-core-client/src/main/java/org/apache/phoenix/compile/ScanRanges.java#L286] > line such that {{wrkStartKey}} and {{originalStopKey}} both belong to > different salt buckets then its wrong. > -- This message was sent by Atlassian Jira (v8.20.10#820010)