[
https://issues.apache.org/jira/browse/HIVE-19416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16496799#comment-16496799
]
Steve Yeom commented on HIVE-19416:
-----------------------------------
The current single version stats has:
1. Definitions and Categories
- Valid transactional stats:
I.e., a conjunction of the three:
~ a committed transaction created the stats
~ COLUMN_STATE_ACCURATE(CSA) state is true
~ Isolation-level (snapshot) compliant
- Two kinds of stats: table and column
- COLUMN_STATS_ACCURATE(CSA) states for a table/partition: true or false.
one for table, one per each column
- Categories of clients:
~ Stats reader:
^ StatsOptimizer for aggregation query: transactional stats reader
^ The rest that uses stats for cost computation inputs: non-transactional
stats reader
~ Stats updater: transactional stats updater
2. Transactional Stats Operations
2.1 Stats Update
Update the single version stats, both table and column and save a table
snapshot to UPD_TXNS.
- A client requests an update with stats and a table snapshot [1].
- creates a TBLS/PARTITIONS row adding a row into UPD_TXNS row with table
write snapshot.
~ Updates "table stats" by updading TABLE_PARAMS/PARTITION_PARAMS
- Updates "column stats" by updating TAB_COL_STATS/PART_COL_STATS
- commit/abort
~ abortTcn() deletes the UPD_TXN row for the transaction.
Note: now stats reader determines the state of the transactional stats'
updater transaction
by checking TXNS for open state, and checking existence of a row in
UPD_TXNS for committed/aborted.
2.2 Stats Read
StatsOptimizer determines validity of the MetaStore transactional stats
to use stats for an aggregation query.
2.2.1 Table stats
The reader gets a TBLS/PARTITIONS row that includes table stats.
Then check the validity of the table stats.
- A client comes in with its request that includes the client's table
snapshot.
- Reads a row from TBLS/PARTITIONS.
- Check if the CSA for table stats is true. If not, return after setting
CSA.
- Check if stats' update transaction is committed: check if a row exists
from UPD_TXNS
for the TXN_ID from TBLS/PARTITIONS. If not, invalid.
- compare the current stats' table snapshot with the client's table
snapshot
- if the table snapshots are equal in commits,
table stats are valid.
2.2.2 Column stats
The reader gets a row from TAB_COL_STATS/PART_COL_STATS.
The same steps as table stats.
3. Current/Possible invariants
3.1 Current
- Metastore TBLS/PARTITIONS keeps CSA updated for committed stats for both
table and columns.
3.2 Possible
- Metastore keeps one committed stats for both table and columns.
Notes:
[1]: transaction id and a valid writeId list for the table.
> Create single version transactional table metastore statistics for
> aggregation queries
> --------------------------------------------------------------------------------------
>
> Key: HIVE-19416
> URL: https://issues.apache.org/jira/browse/HIVE-19416
> Project: Hive
> Issue Type: Bug
> Components: Transactions
> Reporter: Steve Yeom
> Assignee: Steve Yeom
> Priority: Major
>
> The system should use only statistics for aggregation queries like count on
> transactional tables.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)