[jira] [Resolved] (PHOENIX-7619) Excess HFiles are being read to look for more than required column versions

Sanjeet Malhotra (Jira) Wed, 11 Jun 2025 10:59:59 -0700


     [ 
https://issues.apache.org/jira/browse/PHOENIX-7619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Sanjeet Malhotra resolved PHOENIX-7619.
---------------------------------------
    Fix Version/s: 5.2.2
                   5.3
       Resolution: Fixed

> Excess HFiles are being read to look for more than required column versions
> ---------------------------------------------------------------------------
>
>                 Key: PHOENIX-7619
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-7619
>             Project: Phoenix
>          Issue Type: Bug
>    Affects Versions: 5.2.0, 5.2.1, 5.2.2, 5.3, 5.2.3
>            Reporter: Sanjeet Malhotra
>            Assignee: Sanjeet Malhotra
>            Priority: Major
>             Fix For: 5.2.2, 5.3
>
>
> Steps to reproduce:
>  * Create table with one column family.
> {code:java}
> CREATE TABLE TEST.HBASE_READS( ID1 VARCHAR NOT NULL, ID2 VARCHAR, VAL1 
> VARCHAR CONSTRAINT PK PRIMARY KEY (ID1)) BLOOMFILTER = NONE;{code}
>  * Write some data to the table and flush the table. So, that there is at 
> least 1 HFile. (During my testing I ensured there are 3 HFiles per region.)
>  * Write some more data to the table but this time don't flush the table. So, 
> this data will stay in memstore.
>  * Query a single row such that its in the data still in memstore but not in 
> HFiles. So, the rows should come purely from memstore w/o even needing to 
> read from HFile.
> Expectation: The queried row should come from memstore and there shouldn't be 
> any need to read HFiles.
> Actual: Memstore along with all HFiles were scanned to get the Result back to 
> the client.
>  
> Reason:
> In HBase, when [StoreScanner is initialized 
> |https://github.com/apache/hbase/blob/efa228ef446c0e63bbe2915a48d3324efab79ccc/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java#L266]then
>  we go for lazy seek as Scan object coming from Phoenix specifies column 
> qualifiers to be queried. If the StoreFile on which we are doing lazy seek 
> has no deleteFamily or deleteFamilyVersion markers then [this 
> line|https://github.com/apache/hbase/blob/b21ba71f73881336345fd5dd7d647910b3058e05/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java#L438]
>  will be hit. Same will be done for all StoreFileScanners. While head of 
> memstore scanner (SegmentScanner) will be at the first column of the given 
> row. Next [this 
> line|https://github.com/apache/hbase/blob/b21ba71f73881336345fd5dd7d647910b3058e05/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/querymatcher/ScanQueryMatcher.java#L192]
>  will be hit until memstore scanner is the top most scanner in the priority 
> queue of all the scanners: 3 StoreFile scanners and 1 memstore scanner. Once 
> memstore Scanner is the top most scanner then first column being queried will 
> be read from memstore and [this line will be 
> hit|https://github.com/apache/hbase/blob/b21ba71f73881336345fd5dd7d647910b3058e05/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/querymatcher/ExplicitColumnTracker.java#L167]
>  after successful column match. Here if {{maxVersions}} have been found then 
> we skip to next column which again will be read from memstore. But if 
> {{maxVersions}} are not found then the we go on to read the next version i.e. 
> next cell which leads to scanning all the StoreFiles. In "User" scans 
> {{maxVersions}} should have been {{1}} for us so, we should have skipped to 
> the next column once we found the latest version of the current column in 
> memstore. But for "User" scans {{maxVersions}} is {{INT_MAX}} for us leading 
> to reading all the StoreFiles. We should have [hit this 
> line|https://github.com/apache/hbase/blob/efa228ef446c0e63bbe2915a48d3324efab79ccc/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java#L746]
>  but we end up [hitting this 
> line|https://github.com/apache/hbase/blob/efa228ef446c0e63bbe2915a48d3324efab79ccc/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java#L704].The
>  {{maxVersions}} is {{INT_MAX}} for us because we override it in 
> [here|https://github.com/apache/phoenix/blob/9cb48832a7e9b9a972d682535179ab6a2fd0cb16/phoenix-core-server/src/main/java/org/apache/phoenix/coprocessor/BaseScannerRegionObserver.java#L432-L435].
>  The {{preStoreScannerOpen}} hook is called for "User" scans. So, we are 
> penalizing all the "User" scans.
>  
> Fix for preStoreScannerOpen() hook:
>  * Don't override MIN_VERSIONS and VERSIONS.
>  * Set TTL to {{Long.MAX_VALUE}} instead of {{HConstants.FOREVER}} . This is 
> needed because {{HConstant.FOREVER}} is INT_MAX and the TTL overridden as 
> part of ScanOptions is interpreted in milliseconds by HBase. INT_MAX value in 
> ms is equivalent to a little less than 25 days. So, HBase will treat even 
> latest version of a column qualifier as expired if its older than 25 days. 
> This can cause rows to partially expire. Currently, rows are not expiring 
> partially because we set MIN_VERSIONS in this hook to INT_MAX. Once we stop 
> overriding MIN_VERSIONS we need to set TTL to Long.MAX_VALUE as TTL's data 
> type is long. Verified this via IT.
>  * Continue overriding {{KeepDeletedCells}} to {{{}TTL{}}}. If we stop doing 
> this then SCN queries will get impacted. Scenario: We keep KeepDeletedCells 
> as {{False.}}  Say at T1 timestamp I wrote a row and at T2 > T1 I delete the 
> row. Now suppose I set my SCN value to a timestamp b/w T2 and T1 then 
> expectation is I should see the inserted row but I won't because to see past 
> delete markers when custom time range is specified in scan I need to set 
> KeepDeletedCells to a value other than {{False}} . I verified this via IT.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Resolved] (PHOENIX-7619) Excess HFiles are being read to look for more than required column versions

Reply via email to