[
https://issues.apache.org/jira/browse/IMPALA-8865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16909480#comment-16909480
]
ASF subversion and git services commented on IMPALA-8865:
---------------------------------------------------------
Commit 8c5ea90aa53dd925ec038ef9d8ea71e7919e3127 in impala's branch
refs/heads/master from Csaba Ringhofer
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=8c5ea90 ]
IMPALA-8836: Support COMPUTE STATS on insert only ACID tables
For ACID tables COMPUTE STATS needs to use a new HMS API, as the
old one is rejected by metastore. This API currently has some
counter intuitive parts:
- setPartitionColumnStatistics is used to set table stats, as there
is no similar function exposed by HMS client for tables at the
moment.
- A new writeId is allocated for the stat change, and this needs
a transaction, so a transaction is opened/committed/aborted even
though this doesn't seem necessary. The Hive code seems to use
internal API for this.
- Even though the HMS thrift Table object has a colStats field,
it is only applied during alter_table if there are other changes
like new columns in the tables, so alter_table couldn't be used
to change column stats.
Additional changes:
- DROP STATS is no longer allowed for transactional tables, as it
turned out that there is no transactional version of the old API.
- Remove COLUMN_STATS_ACCURATE table property during COMPUTE STATS
to ensure that Hive does use stats computed by Impala to return
answer queries like SELECT count(*)
- Changed CatalogOpExecutor.updateCatalog() to get the writeIds
earlier. This can mean unnecassary HMS RPC calls if no property
change is needed in the end, but I felt it hard to reason about
what happens if these RPC calls fail at their original location.
TODOs (My plan is to do these in IMPALA-8865):
- Tried to make the MetastoreShim API easier to use by adding a class
to encapsulate thing like txnId and writeId, but it feels rather
half baked and under documented.
A similar class is added in https://gerrit.cloudera.org/#/c/14071/,
it would be good to merge them.
- The validWriteIdList of the original SELECT(s) behind COMPUTE
STATS could be used in the HMS API calls, but this would need
more plumbing.
Change-Id: I5c06b4678c1ff75c5aa1586a78afea563e64057f
Reviewed-on: http://gerrit.cloudera.org:8080/14066
Reviewed-by: Tim Armstrong <[email protected]>
Tested-by: Tim Armstrong <[email protected]>
> Do COMPUTE STATS on ACID tables in a "proper" transactional way
> ---------------------------------------------------------------
>
> Key: IMPALA-8865
> URL: https://issues.apache.org/jira/browse/IMPALA-8865
> Project: IMPALA
> Issue Type: Improvement
> Components: Backend, Frontend
> Affects Versions: Impala 3.3.0
> Reporter: Csaba Ringhofer
> Assignee: Csaba Ringhofer
> Priority: Critical
> Labels: impala-acid
>
> IMPALA-8836's goal is just to get the stats in somehow in a way that Impala
> can use them and Hive does not treat them as accurate. It would be the best
> if the SELECT(s) that are behind the COMPUTE STATS would use the same
> validWriteId list, and the stats would be set with the same writeId list to
> express that the stats are based on that state of the table. Theoretically
> Hive uses this mechanism to decide whether the stats are up to data by
> comparing a SELECTs validWriteIdList with the one saved for stats and
> considers it stale if the SELECT sees new writeIds.
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]