[
https://issues.apache.org/jira/browse/IMPALA-6056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Joe McDonnell resolved IMPALA-6056.
-----------------------------------
Fix Version/s: Not Applicable
Resolution: Won't Fix
We rarely use snapshots now, so closing this.
> Dataload from snapshot should only compute statistics for new tables
> --------------------------------------------------------------------
>
> Key: IMPALA-6056
> URL: https://issues.apache.org/jira/browse/IMPALA-6056
> Project: IMPALA
> Issue Type: Improvement
> Components: Infrastructure
> Affects Versions: Impala 2.10.0
> Reporter: Joe McDonnell
> Priority: Major
> Fix For: Not Applicable
>
>
> When loading data from a snapshot, create-load-data.sh runs
> compute-table-stats.sh, which will compute statistics for all of the tables.
> However, the hive metastore snapshot already contains statistics from most of
> those tables. Only the Kudu tables are created from scratch in the load from
> snapshot.
> Computing the statistics for everything takes 11 minutes, whereas computing
> statistics only for Kudu takes roughly 3 minutes. This is a meaningful
> savings and hand tests show that only computing statistics for the Kudu
> tables does not impact subsequent tests.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)