[jira] [Resolved] (IMPALA-6056) Dataload from snapshot should only compute statistics for new tables

Joe McDonnell (Jira) Wed, 23 Dec 2020 16:30:04 -0800


     [ 
https://issues.apache.org/jira/browse/IMPALA-6056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Joe McDonnell resolved IMPALA-6056.
-----------------------------------
    Fix Version/s: Not Applicable
       Resolution: Won't Fix

We rarely use snapshots now, so closing this.

> Dataload from snapshot should only compute statistics for new tables
> --------------------------------------------------------------------
>
>                 Key: IMPALA-6056
>                 URL: https://issues.apache.org/jira/browse/IMPALA-6056
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Infrastructure
>    Affects Versions: Impala 2.10.0
>            Reporter: Joe McDonnell
>            Priority: Major
>             Fix For: Not Applicable
>
>
> When loading data from a snapshot, create-load-data.sh runs 
> compute-table-stats.sh, which will compute statistics for all of the tables. 
> However, the hive metastore snapshot already contains statistics from most of 
> those tables. Only the Kudu tables are created from scratch in the load from 
> snapshot.
> Computing the statistics for everything takes 11 minutes, whereas computing 
> statistics only for Kudu takes roughly 3 minutes. This is a meaningful 
> savings and hand tests show that only computing statistics for the Kudu 
> tables does not impact subsequent tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (IMPALA-6056) Dataload from snapshot should only compute statistics for new tables

Reply via email to