[jira] [Created] (PHOENIX-7347) Use most up-to-date value Table level max lookback from SYSCAT during flush

Sanjeet Malhotra (Jira) Tue, 25 Jun 2024 08:14:04 -0700

Sanjeet Malhotra created PHOENIX-7347:
-----------------------------------------


             Summary: Use most up-to-date value Table level max lookback from 
SYSCAT during flush
                 Key: PHOENIX-7347
                 URL: https://issues.apache.org/jira/browse/PHOENIX-7347
             Project: Phoenix
          Issue Type: Improvement
    Affects Versions: 5.3.0
            Reporter: Sanjeet Malhotra
            Assignee: Sanjeet Malhotra


Currently to avoid fetching table level max lookback from SYSCAT while doing 
flush (in preFlush hook) we use a cached value. The value of table level max 
lookback is cached when compactions (minor and major both) are triggered. And, 
that cached value is used in preFlush hook but it also means:
 # We won't start preserving rows as soon as max lookback is altered so making 
it very slow eventually consistent.
 # Table level max lookback for a table and index can go out of sync for good 
amount of time.

The reason we use cached max lookback value in preFlush but use latest value in 
preCompact is: preFlush and in turn flush, can be called when cluster teardowns 
and it gives an impression that ITs (e.g. SystemTablesCreationOnConnectionIT.
testDoNotUpgradePropSet()) got stuck while following is happening under the 
hood: # We try to create Phoenix connection.
 # As part of creating Phoenix connection, get call is made to hbase:meta to 
check table state of SYSCAT.
 # The get call is rejected as mini cluster is tearing down. Get call has a 
check to see if RS is stopping (via checkOpen()).
 # As Phoenix connection fails and HBase client uses RPCRetryingCaller so, it 
keeps retrying until all attempts get exhausted.
 # Once all attempts are exhausted exception is thrown back to caller code in 
Phoenix.

Once exception is thrown back to Phoenix caller we can flush complete data as 
in production, attempt to fetch max lookback from SYSCAT will fail only if 
SYCAT is offline or all RS goes down on which SYSCAT can be hosted. In such 
extreme scenario flushing some extra data is not an issue given minor/major 
compactions will take care of removing excess data outside max lookback window.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (PHOENIX-7347) Use most up-to-date value Table level max lookback from SYSCAT during flush

Reply via email to