[
https://issues.apache.org/jira/browse/HIVE-19416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16553544#comment-16553544
]
Sergey Shelukhin commented on HIVE-19416:
-----------------------------------------
Updated the description
> Create single version transactional table metastore statistics for
> aggregation queries
> --------------------------------------------------------------------------------------
>
> Key: HIVE-19416
> URL: https://issues.apache.org/jira/browse/HIVE-19416
> Project: Hive
> Issue Type: Bug
> Components: Transactions
> Reporter: Steve Yeom
> Assignee: Steve Yeom
> Priority: Major
>
> This adds accurate stats support to Hive transactional (insert-only and full
> ACID) tables, so that some queries from these tables can be answered from
> stats and also so that the stats could be used for more optimization. This
> support can be enabled via a config flag, and is on by default.
> This is achieved via the following changes, that basically start us on the
> path of treating ACID stats the same way we treat ACID data:
> In addition to existing JSON blob, we store a write ID of the latest stats
> writer with each table and partition. Any writer updating the stats or
> altering the stats state for a txn table has to record his write ID - if the
> write ID and some other context is not provided by the caller of alter, the
> alter table/partition operation fails. It's the responsibility of the writer
> to not commit its transaction if the operation fails.
> In future, we'd like to move the stats state into actual stats tables, but
> for now it's (logically) colocated with the existing json parameter.
> In addition to its write ID, most callers (with the exception of the ones
> that cannot have races, e.g. create table) provide their own txn state (write
> ID list) for the table. The existing stats' write ID is verified against this
> state. If the write ID is not visible to the updater, we still update the
> stats, but set the stats state to invalid; that basically means that two
> parallel operations that cannot see each others' data output are updating the
> stats.
> This way, txn stats stay valid as the result of a sequence of non-conflicting
> updates that can all see each other and account for each others' data for the
> stats. Any parallel updates invalidate the stats.
> This is necessary because unlike data, stats are a single version summary of
> the table. To be able to support parallel operations with valid stats, each
> stats update would need to write a separate record, and would also need to
> write mergeable records that only reflect its own changes, instead of the
> final view of the table stats (two of those are hard or impossible to merge).
> This approach resulted in a few changes to alter/etc APIs in metastore; it
> also requires that many alter operations, as well as analyze table, allocate
> a write ID (because they affect stats-that-are-treated-like-data, and so are
> essentially a write operation).
> The reader, in turn, verifies that the stats are valid and written by a write
> ID that is itself valid given the reader's transactional state (i.e. not
> aborted, nor in progress).
> We've considered (and actually implemented) an alternative approach of
> recording the full txn state of the stats writer to be compared with the
> state of the stats reader (to see if they are compatible and avoid the extra
> write IDs and strict write-time checks), however it results in problems with
> partitioned tables, where not all writes affect all partitions, and so the
> stats state of all the untouched partitions becomes invalid once a subset of
> partitions is updated (because we cannot tell whether the write ID, a table
> level operation, didn't touch the partition, or did touch it but didn't
> record the stats). Additionally, storing full txn state for every partition
> and table can be expensive, especially in extreme cases where the watermark
> doesn't advance for a while for some reason.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)