[ 
https://issues.apache.org/jira/browse/HIVE-19416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16553544#comment-16553544
 ] 

Sergey Shelukhin commented on HIVE-19416:
-----------------------------------------

Updated the description

> Create single version transactional table metastore statistics for 
> aggregation queries
> --------------------------------------------------------------------------------------
>
>                 Key: HIVE-19416
>                 URL: https://issues.apache.org/jira/browse/HIVE-19416
>             Project: Hive
>          Issue Type: Bug
>          Components: Transactions
>            Reporter: Steve Yeom
>            Assignee: Steve Yeom
>            Priority: Major
>
> This adds accurate stats support to Hive transactional (insert-only and full 
> ACID) tables, so that some queries from these tables can be answered from 
> stats and also so that the stats could be used for more optimization. This 
> support can be enabled via a config flag, and is on by default.
> This is achieved via the following changes, that basically start us on the 
> path of treating ACID stats the same way we treat ACID data:
> In addition to existing JSON blob, we store a write ID of the latest stats 
> writer with each table and partition. Any writer updating the stats or 
> altering the stats state for a txn table has to record his write ID - if the 
> write ID and some other context is not provided by the caller of alter, the 
> alter table/partition operation fails. It's the responsibility of the writer 
> to not commit its transaction if the operation fails.
> In future, we'd like to move the stats state into actual stats tables, but 
> for now it's (logically) colocated with the existing json parameter.
> In addition to its write ID, most callers (with the exception of the ones 
> that cannot have races, e.g. create table) provide their own txn state (write 
> ID list) for the table. The existing stats' write ID is verified against this 
> state. If the write ID is not visible to the updater, we still update the 
> stats, but set the stats state to invalid; that basically means that two 
> parallel operations that cannot see each others' data output are updating the 
> stats.
> This way, txn stats stay valid as the result of a sequence of non-conflicting 
> updates that can all see each other and account for each others' data for the 
> stats. Any parallel updates invalidate the stats.
> This is necessary because unlike data, stats are a single version summary of 
> the table. To be able to support parallel operations with valid stats, each 
> stats update would need to write a separate record, and would also need to 
> write mergeable records that only reflect its own changes, instead of the 
> final view of the table stats (two of those are hard or impossible to merge).
> This approach resulted in a few changes to alter/etc APIs in metastore; it 
> also requires that many alter operations, as well as analyze table, allocate 
> a write ID (because they affect stats-that-are-treated-like-data, and so are 
> essentially a write operation).
> The reader, in turn, verifies that the stats are valid and written by a write 
> ID that is itself valid given the reader's transactional state (i.e. not 
> aborted, nor in progress).
> We've considered (and actually implemented) an alternative approach of 
> recording the full txn state of the stats writer to be compared with the 
> state of the stats reader (to see if they are compatible and avoid the extra 
> write IDs and strict write-time checks), however it results in problems with 
> partitioned tables, where not all writes affect all partitions, and so the 
> stats state of all the untouched partitions becomes invalid once a subset of 
> partitions is updated (because we cannot tell whether the write ID, a table 
> level operation, didn't touch the partition, or did touch it but didn't 
> record the stats). Additionally, storing full txn state for every partition 
> and table can be expensive, especially in extreme cases where the watermark 
> doesn't advance for a while for some reason.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to