[ https://issues.apache.org/jira/browse/PHOENIX-7619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sanjeet Malhotra resolved PHOENIX-7619. --------------------------------------- Fix Version/s: 5.2.2 5.3 Resolution: Fixed > Excess HFiles are being read to look for more than required column versions > --------------------------------------------------------------------------- > > Key: PHOENIX-7619 > URL: https://issues.apache.org/jira/browse/PHOENIX-7619 > Project: Phoenix > Issue Type: Bug > Affects Versions: 5.2.0, 5.2.1, 5.2.2, 5.3, 5.2.3 > Reporter: Sanjeet Malhotra > Assignee: Sanjeet Malhotra > Priority: Major > Fix For: 5.2.2, 5.3 > > > Steps to reproduce: > * Create table with one column family. > {code:java} > CREATE TABLE TEST.HBASE_READS( ID1 VARCHAR NOT NULL, ID2 VARCHAR, VAL1 > VARCHAR CONSTRAINT PK PRIMARY KEY (ID1)) BLOOMFILTER = NONE;{code} > * Write some data to the table and flush the table. So, that there is at > least 1 HFile. (During my testing I ensured there are 3 HFiles per region.) > * Write some more data to the table but this time don't flush the table. So, > this data will stay in memstore. > * Query a single row such that its in the data still in memstore but not in > HFiles. So, the rows should come purely from memstore w/o even needing to > read from HFile. > Expectation: The queried row should come from memstore and there shouldn't be > any need to read HFiles. > Actual: Memstore along with all HFiles were scanned to get the Result back to > the client. > > Reason: > In HBase, when [StoreScanner is initialized > |https://github.com/apache/hbase/blob/efa228ef446c0e63bbe2915a48d3324efab79ccc/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java#L266]then > we go for lazy seek as Scan object coming from Phoenix specifies column > qualifiers to be queried. If the StoreFile on which we are doing lazy seek > has no deleteFamily or deleteFamilyVersion markers then [this > line|https://github.com/apache/hbase/blob/b21ba71f73881336345fd5dd7d647910b3058e05/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java#L438] > will be hit. Same will be done for all StoreFileScanners. While head of > memstore scanner (SegmentScanner) will be at the first column of the given > row. Next [this > line|https://github.com/apache/hbase/blob/b21ba71f73881336345fd5dd7d647910b3058e05/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/querymatcher/ScanQueryMatcher.java#L192] > will be hit until memstore scanner is the top most scanner in the priority > queue of all the scanners: 3 StoreFile scanners and 1 memstore scanner. Once > memstore Scanner is the top most scanner then first column being queried will > be read from memstore and [this line will be > hit|https://github.com/apache/hbase/blob/b21ba71f73881336345fd5dd7d647910b3058e05/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/querymatcher/ExplicitColumnTracker.java#L167] > after successful column match. Here if {{maxVersions}} have been found then > we skip to next column which again will be read from memstore. But if > {{maxVersions}} are not found then the we go on to read the next version i.e. > next cell which leads to scanning all the StoreFiles. In "User" scans > {{maxVersions}} should have been {{1}} for us so, we should have skipped to > the next column once we found the latest version of the current column in > memstore. But for "User" scans {{maxVersions}} is {{INT_MAX}} for us leading > to reading all the StoreFiles. We should have [hit this > line|https://github.com/apache/hbase/blob/efa228ef446c0e63bbe2915a48d3324efab79ccc/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java#L746] > but we end up [hitting this > line|https://github.com/apache/hbase/blob/efa228ef446c0e63bbe2915a48d3324efab79ccc/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java#L704].The > {{maxVersions}} is {{INT_MAX}} for us because we override it in > [here|https://github.com/apache/phoenix/blob/9cb48832a7e9b9a972d682535179ab6a2fd0cb16/phoenix-core-server/src/main/java/org/apache/phoenix/coprocessor/BaseScannerRegionObserver.java#L432-L435]. > The {{preStoreScannerOpen}} hook is called for "User" scans. So, we are > penalizing all the "User" scans. > > Fix for preStoreScannerOpen() hook: > * Don't override MIN_VERSIONS and VERSIONS. > * Set TTL to {{Long.MAX_VALUE}} instead of {{HConstants.FOREVER}} . This is > needed because {{HConstant.FOREVER}} is INT_MAX and the TTL overridden as > part of ScanOptions is interpreted in milliseconds by HBase. INT_MAX value in > ms is equivalent to a little less than 25 days. So, HBase will treat even > latest version of a column qualifier as expired if its older than 25 days. > This can cause rows to partially expire. Currently, rows are not expiring > partially because we set MIN_VERSIONS in this hook to INT_MAX. Once we stop > overriding MIN_VERSIONS we need to set TTL to Long.MAX_VALUE as TTL's data > type is long. Verified this via IT. > * Continue overriding {{KeepDeletedCells}} to {{{}TTL{}}}. If we stop doing > this then SCN queries will get impacted. Scenario: We keep KeepDeletedCells > as {{False.}} Say at T1 timestamp I wrote a row and at T2 > T1 I delete the > row. Now suppose I set my SCN value to a timestamp b/w T2 and T1 then > expectation is I should see the inserted row but I won't because to see past > delete markers when custom time range is specified in scan I need to set > KeepDeletedCells to a value other than {{False}} . I verified this via IT. -- This message was sent by Atlassian Jira (v8.20.10#820010)