[
https://issues.apache.org/jira/browse/HIVE-15033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15648277#comment-15648277
]
Eugene Koifman commented on HIVE-15033:
---------------------------------------
Perhaps this needs a different approach altogether.
Merge only works on full ACID tables which use MVCC.
Stats data, on the other hand, is not versioned.
Even if we fix multiple StatsTask issue in Merge, it's perfectly legal in Hive
to have to concurrent inserts into an Acid table so you could still end up with
stats data for the partition which is not accurate (assuming parallel stats
computations simply overwrite each other, rather than corrupt some data
structures).
The later is a general issue with Acid and stats (and may be for MicroManaged
tables as well).
Perhaps the right answer is to only compute stats for Acid tables at
compactions (or even add another process to trigger stats computation based on
number of writes to the partition). As long as stats are used to guide the
CBO, not provide exact answers to queries, they would still be approximately
accurate and thus useful.
cc [~alangates], [~pxiong]
> Ensure there is only 1 StatsTask in the query plan
> --------------------------------------------------
>
> Key: HIVE-15033
> URL: https://issues.apache.org/jira/browse/HIVE-15033
> Project: Hive
> Issue Type: Sub-task
> Components: Transactions
> Reporter: Eugene Koifman
> Assignee: Eugene Koifman
>
> currently there is 1 per WHEN clause
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)