[ 
https://issues.apache.org/jira/browse/HIVE-19416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-19416:
------------------------------------
    Description: 
This adds accurate stats support to Hive transactional (insert-only and full 
ACID) tables, so that some queries from these tables can be answered from stats 
and also so that the stats could be used for more optimization. This support 
can be enabled via a config flag, and is on by default.

This is achieved via the following changes, that basically start us on the path 
of treating ACID stats the same way we treat ACID data:
In addition to existing JSON blob, we store a write ID of the latest stats 
writer with each table and partition. Any writer updating the stats or altering 
the stats state for a txn table has to record his write ID - if the write ID 
and some other context is not provided by the caller of alter, the alter 
table/partition operation fails. It's the responsibility of the writer to not 
commit its transaction if the operation fails.
In future, we'd like to move the stats state into actual stats tables, but for 
now it's (logically) colocated with the existing json parameter.

In addition to its write ID, most callers (with the exception of the ones that 
cannot have races, e.g. create table) provide their own txn state (write ID 
list) for the table. The existing stats' write ID is verified against this 
state. If the write ID is not visible to the updater, we still update the 
stats, but set the stats state to invalid; that basically means that two 
parallel operations that cannot see each others' data output are updating the 
stats.

This way, txn stats stay valid as the result of a sequence of non-conflicting 
updates that can all see each other and account for each others' data for the 
stats. Any parallel updates invalidate the stats.
This is necessary because unlike data, stats are a single version summary of 
the table. To be able to support parallel operations with valid stats, each 
stats update would need to write a separate record, and would also need to 
write mergeable records that only reflect its own changes, instead of the final 
view of the table stats (two of those are hard or impossible to merge).

This approach resulted in a few changes to alter/etc APIs in metastore; it also 
requires that many alter operations, as well as analyze table, allocate a write 
ID (because they affect stats-that-are-treated-like-data, and so are 
essentially a write operation).

The reader, in turn, verifies that the stats are valid and written by a write 
ID that is itself valid given the reader's transactional state (i.e. not 
aborted, nor in progress). This is done on metastore side; if the stats are 
invalid for the reader, we transparently update the stats state returned to the 
caller to mark the stats as inaccurate.

We've considered (and actually implemented) an alternative approach of 
recording the full txn state of the stats writer to be compared with the state 
of the stats reader (to see if they are compatible and avoid the extra write 
IDs and strict write-time checks), however it results in problems with 
partitioned tables, where not all writes affect all partitions, and so the 
stats state of all the untouched partitions becomes invalid once a subset of 
partitions is updated (because we cannot tell whether the write ID, a table 
level operation, didn't touch the partition, or did touch it but didn't record 
the stats). Additionally, storing full txn state for every partition and table 
can be expensive, especially in extreme cases where the watermark doesn't 
advance for a while for some reason.




  was:
This adds accurate stats support to Hive transactional (insert-only and full 
ACID) tables, so that some queries from these tables can be answered from stats 
and also so that the stats could be used for more optimization. This support 
can be enabled via a config flag, and is on by default.

This is achieved via the following changes, that basically start us on the path 
of treating ACID stats the same way we treat ACID data:
In addition to existing JSON blob, we store a write ID of the latest stats 
writer with each table and partition. Any writer updating the stats or altering 
the stats state for a txn table has to record his write ID - if the write ID 
and some other context is not provided by the caller of alter, the alter 
table/partition operation fails. It's the responsibility of the writer to not 
commit its transaction if the operation fails.
In future, we'd like to move the stats state into actual stats tables, but for 
now it's (logically) colocated with the existing json parameter.

In addition to its write ID, most callers (with the exception of the ones that 
cannot have races, e.g. create table) provide their own txn state (write ID 
list) for the table. The existing stats' write ID is verified against this 
state. If the write ID is not visible to the updater, we still update the 
stats, but set the stats state to invalid; that basically means that two 
parallel operations that cannot see each others' data output are updating the 
stats.

This way, txn stats stay valid as the result of a sequence of non-conflicting 
updates that can all see each other and account for each others' data for the 
stats. Any parallel updates invalidate the stats.
This is necessary because unlike data, stats are a single version summary of 
the table. To be able to support parallel operations with valid stats, each 
stats update would need to write a separate record, and would also need to 
write mergeable records that only reflect its own changes, instead of the final 
view of the table stats (two of those are hard or impossible to merge).

This approach resulted in a few changes to alter/etc APIs in metastore; it also 
requires that many alter operations, as well as analyze table, allocate a write 
ID (because they affect stats-that-are-treated-like-data, and so are 
essentially a write operation).

The reader, in turn, verifies that the stats are valid and written by a write 
ID that is itself valid given the reader's transactional state (i.e. not 
aborted, nor in progress).

We've considered (and actually implemented) an alternative approach of 
recording the full txn state of the stats writer to be compared with the state 
of the stats reader (to see if they are compatible and avoid the extra write 
IDs and strict write-time checks), however it results in problems with 
partitioned tables, where not all writes affect all partitions, and so the 
stats state of all the untouched partitions becomes invalid once a subset of 
partitions is updated (because we cannot tell whether the write ID, a table 
level operation, didn't touch the partition, or did touch it but didn't record 
the stats). Additionally, storing full txn state for every partition and table 
can be expensive, especially in extreme cases where the watermark doesn't 
advance for a while for some reason.





> Create single version transactional table metastore statistics for 
> aggregation queries
> --------------------------------------------------------------------------------------
>
>                 Key: HIVE-19416
>                 URL: https://issues.apache.org/jira/browse/HIVE-19416
>             Project: Hive
>          Issue Type: Bug
>          Components: Transactions
>            Reporter: Steve Yeom
>            Assignee: Steve Yeom
>            Priority: Major
>
> This adds accurate stats support to Hive transactional (insert-only and full 
> ACID) tables, so that some queries from these tables can be answered from 
> stats and also so that the stats could be used for more optimization. This 
> support can be enabled via a config flag, and is on by default.
> This is achieved via the following changes, that basically start us on the 
> path of treating ACID stats the same way we treat ACID data:
> In addition to existing JSON blob, we store a write ID of the latest stats 
> writer with each table and partition. Any writer updating the stats or 
> altering the stats state for a txn table has to record his write ID - if the 
> write ID and some other context is not provided by the caller of alter, the 
> alter table/partition operation fails. It's the responsibility of the writer 
> to not commit its transaction if the operation fails.
> In future, we'd like to move the stats state into actual stats tables, but 
> for now it's (logically) colocated with the existing json parameter.
> In addition to its write ID, most callers (with the exception of the ones 
> that cannot have races, e.g. create table) provide their own txn state (write 
> ID list) for the table. The existing stats' write ID is verified against this 
> state. If the write ID is not visible to the updater, we still update the 
> stats, but set the stats state to invalid; that basically means that two 
> parallel operations that cannot see each others' data output are updating the 
> stats.
> This way, txn stats stay valid as the result of a sequence of non-conflicting 
> updates that can all see each other and account for each others' data for the 
> stats. Any parallel updates invalidate the stats.
> This is necessary because unlike data, stats are a single version summary of 
> the table. To be able to support parallel operations with valid stats, each 
> stats update would need to write a separate record, and would also need to 
> write mergeable records that only reflect its own changes, instead of the 
> final view of the table stats (two of those are hard or impossible to merge).
> This approach resulted in a few changes to alter/etc APIs in metastore; it 
> also requires that many alter operations, as well as analyze table, allocate 
> a write ID (because they affect stats-that-are-treated-like-data, and so are 
> essentially a write operation).
> The reader, in turn, verifies that the stats are valid and written by a write 
> ID that is itself valid given the reader's transactional state (i.e. not 
> aborted, nor in progress). This is done on metastore side; if the stats are 
> invalid for the reader, we transparently update the stats state returned to 
> the caller to mark the stats as inaccurate.
> We've considered (and actually implemented) an alternative approach of 
> recording the full txn state of the stats writer to be compared with the 
> state of the stats reader (to see if they are compatible and avoid the extra 
> write IDs and strict write-time checks), however it results in problems with 
> partitioned tables, where not all writes affect all partitions, and so the 
> stats state of all the untouched partitions becomes invalid once a subset of 
> partitions is updated (because we cannot tell whether the write ID, a table 
> level operation, didn't touch the partition, or did touch it but didn't 
> record the stats). Additionally, storing full txn state for every partition 
> and table can be expensive, especially in extreme cases where the watermark 
> doesn't advance for a while for some reason.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to