Sanjeet Malhotra created PHOENIX-7619:
-----------------------------------------

             Summary: Excess HFiles are being read to look for more than 
required column versions
                 Key: PHOENIX-7619
                 URL: https://issues.apache.org/jira/browse/PHOENIX-7619
             Project: Phoenix
          Issue Type: Bug
            Reporter: Sanjeet Malhotra
            Assignee: Sanjeet Malhotra


Steps to reproduce:
 * Create table with one column family.

{code:java}
CREATE TABLE TEST.HBASE_READS( ID1 VARCHAR NOT NULL, ID2 VARCHAR, VAL1 VARCHAR 
CONSTRAINT PK PRIMARY KEY (ID1)) BLOOMFILTER = NONE;{code}
 * Write some data to the table and flush the table. So, that there is at least 
1 HFile. (During my testing I ensured there are 3 HFiles per region.)
 * Write some more data to the table but this time don't flush the table. So, 
this data will stay in memstore.
 * Query a single row such that its in the data still in memstore but not in 
HFiles. So, the rows should come purely from memstore w/o even needing to read 
from HFile.



Expectation: The queried row should come from memstore and there shouldn't be 
any need to read HFiles.
Actual: Memstore along with all HFiles were scanned to get the Result back to 
the client.
 
Reason:
In HBase, when [StoreScanner is initialized 
|https://github.com/apache/hbase/blob/efa228ef446c0e63bbe2915a48d3324efab79ccc/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java#L266]then
 we go for lazy seek as Scan object coming from Phoenix specifies column 
qualifiers to be queried. If the StoreFile on which we are doing lazy seek has 
no deleteFamily or deleteFamilyVersion markers then [this 
line|https://github.com/apache/hbase/blob/b21ba71f73881336345fd5dd7d647910b3058e05/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java#L438]
 will be hit. Same will be done for all StoreFileScanners. While head of 
memstore scanner (SegmentScanner) will be at the first column of the given row. 
Next [this 
line|https://github.com/apache/hbase/blob/b21ba71f73881336345fd5dd7d647910b3058e05/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/querymatcher/ScanQueryMatcher.java#L192]
 will be hit until memstore scanner is the top most scanner in the priority 
queue of all the scanners: 3 StoreFile scanners and 1 memstore scanner. Once 
memstore Scanner is the top most scanner then first column being queried will 
be read from memstore and [this line will be 
hit|https://github.com/apache/hbase/blob/b21ba71f73881336345fd5dd7d647910b3058e05/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/querymatcher/ExplicitColumnTracker.java#L167]
 after successful column match. Here if {{maxVersions}} have been found then we 
skip to next column which again will be read from memstore. But if 
{{maxVersions}} are not found then the we go on to read the next version i.e. 
next cell which leads to scanning all the StoreFiles. In "User" scans 
{{maxVersions}} should have been {{1}} for us so, we should have skipped to the 
next column once we found the latest version of the current column in memstore. 
But for "User" scans {{maxVersions}} is {{INT_MAX}} for us leading to reading 
all the StoreFiles. We should have [hit this 
line|https://github.com/apache/hbase/blob/efa228ef446c0e63bbe2915a48d3324efab79ccc/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java#L746]
 but we end up [hitting this 
line|https://github.com/apache/hbase/blob/efa228ef446c0e63bbe2915a48d3324efab79ccc/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java#L704].The
 {{maxVersions}} is {{INT_MAX}} for us because we override it in 
[here|https://github.com/apache/phoenix/blob/9cb48832a7e9b9a972d682535179ab6a2fd0cb16/phoenix-core-server/src/main/java/org/apache/phoenix/coprocessor/BaseScannerRegionObserver.java#L432-L435].
 The {{preStoreScannerOpen}} hook is called for "User" scans. So, we are 
penalizing all the "User" scans.
 
Fix for preStoreScannerOpen() hook:
 * Don't override MIN_VERSIONS and VERSIONS.
 * Set TTL to {{Long.MAX_VALUE}} instead of {{HConstants.FOREVER}} . This is 
needed because {{HConstant.FOREVER}} is INT_MAX and the TTL overridden as part 
of ScanOptions is interpreted in milliseconds by HBase. INT_MAX value in ms is 
equivalent to a little less than 25 days. So, HBase will treat even latest 
version of a column qualifier as expired if its older than 25 days. This can 
cause rows to partially expire. Currently, rows are not expiring partially 
because we set MIN_VERSIONS in this hook to INT_MAX. Once we stop overriding 
MIN_VERSIONS we need to set TTL to Long.MAX_VALUE as TTL's data type is long. 
Verified this via IT.
 * Continue overriding {{KeepDeletedCells}} to {{{}TTL{}}}. If we stop doing 
this then SCN queries will get impacted. Scenario: We keep KeepDeletedCells as 
{{False.}}  Say at T1 timestamp I wrote a row and at T2 > T1 I delete the row. 
Now suppose I set my SCN value to a timestamp b/w T2 and T1 then expectation is 
I should see the inserted row but I won't because to see past delete markers 
when custom time range is specified in scan I need to set KeepDeletedCells to a 
value other than {{False}} . I verified this via IT.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to