[
https://issues.apache.org/jira/browse/IMPALA-8883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16949401#comment-16949401
]
Attila Jeges commented on IMPALA-8883:
--------------------------------------
INSERT:
Updating the table/partition numRows stats after an INSERT with the number of
newly added rows is currently not possible. To implement this feature properly
we would need to retrieve the table/partition numRows stats that correspond to
the current valid write id list. I couldn't find anything in the HMS API to
support this.
TRUNCATE
Updating the table/partition numRows stats after a TRUNCATE is probably
possible:
- Currently TRUNCATE acquires an exclusive lock on the table.
- Table/property numRows stats have to be reset to 0. No need to retrieve the
"previous" stats.
> Update statistics of ACID tables during writes
> ----------------------------------------------
>
> Key: IMPALA-8883
> URL: https://issues.apache.org/jira/browse/IMPALA-8883
> Project: IMPALA
> Issue Type: Improvement
> Reporter: Zoltán Borók-Nagy
> Assignee: Attila Jeges
> Priority: Major
> Labels: impala-acid
>
> When Impala INSERTs or TRUNCATEs an ACID table it simply removes the
> COLUMN_STATS_ACCURATE property to invalidate the statistics in order to
> prevent Hive using it.
> Instead of it Impala should properly update the statistics. It should be
> relatively simple for TRUNCATE since it erases all the data, but a bit more
> complicated for INSERT, e.g.:
> * Properly update _number of distinct values_
> * INSERT OVERWRITE partition should properly update table level _number of
> rows_.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]