Alexander Behm created IMPALA-6853:
--------------------------------------

             Summary: COMPUTE STATS does an unnecessary REFRESH after writing 
to the Metastore
                 Key: IMPALA-6853
                 URL: https://issues.apache.org/jira/browse/IMPALA-6853
             Project: IMPALA
          Issue Type: Bug
          Components: Catalog
    Affects Versions: Impala 2.11.0, Impala 2.10.0, Impala 2.9.0, Impala 2.12.0
            Reporter: Alexander Behm
            Assignee: Dimitris Tsirogiannis


COMPUTE STATS and possibly other DDL operations unnecessarily do the equivalent 
of a REFRESH after writing to the Hive Metastore. This unnecessary operation 
can be very expensive, so should be avoided.

The behavior can be confirmed from the catalogd logs:
{code}
compute stats functional_parquet.alltypes;
+-------------------------------------------+
| summary                                   |
+-------------------------------------------+
| Updated 24 partition(s) and 11 column(s). |
+-------------------------------------------+

Relevant catalogd.INFO snippet
I0413 14:40:24.210749 27295 HdfsTable.java:1263] Incrementally loading table 
metadata for: functional_parquet.alltypes
I0413 14:40:24.242122 27295 HdfsTable.java:555] Refreshed file metadata for 
functional_parquet.alltypes Path: 
hdfs://localhost:20500/test-warehouse/alltypes_parquet/year=2009/month=1: 
Loaded files: 1 Hidden files: 0 Skipped files: 0 Unknown diskIDs: 0
I0413 14:40:24.244634 27295 HdfsTable.java:555] Refreshed file metadata for 
functional_parquet.alltypes Path: 
hdfs://localhost:20500/test-warehouse/alltypes_parquet/year=2009/month=10: 
Loaded files: 1 Hidden files: 0 Skipped files: 0 Unknown diskIDs: 0
I0413 14:40:24.247174 27295 HdfsTable.java:555] Refreshed file metadata for 
functional_parquet.alltypes Path: 
hdfs://localhost:20500/test-warehouse/alltypes_parquet/year=2009/month=11: 
Loaded files: 1 Hidden files: 0 Skipped files: 0 Unknown diskIDs: 0
I0413 14:40:24.249713 27295 HdfsTable.java:555] Refreshed file metadata for 
functional_parquet.alltypes Path: 
hdfs://localhost:20500/test-warehouse/alltypes_parquet/year=2009/month=12: 
Loaded files: 1 Hidden files: 0 Skipped files: 0 Unknown diskIDs: 0
I0413 14:40:24.252288 27295 HdfsTable.java:555] Refreshed file metadata for 
functional_parquet.alltypes Path: 
hdfs://localhost:20500/test-warehouse/alltypes_parquet/year=2009/month=2: 
Loaded files: 1 Hidden files: 0 Skipped files: 0 Unknown diskIDs: 0
I0413 14:40:24.254629 27295 HdfsTable.java:555] Refreshed file metadata for 
functional_parquet.alltypes Path: 
hdfs://localhost:20500/test-warehouse/alltypes_parquet/year=2009/month=3: 
Loaded files: 1 Hidden files: 0 Skipped files: 0 Unknown diskIDs: 0
I0413 14:40:24.256991 27295 HdfsTable.java:555] Refreshed file metadata for 
functional_parquet.alltypes Path: 
hdfs://localhost:20500/test-warehouse/alltypes_parquet/year=2009/month=4: 
Loaded files: 1 Hidden files: 0 Skipped files: 0 Unknown diskIDs: 0
I0413 14:40:24.259464 27295 HdfsTable.java:555] Refreshed file metadata for 
functional_parquet.alltypes Path: 
hdfs://localhost:20500/test-warehouse/alltypes_parquet/year=2009/month=5: 
Loaded files: 1 Hidden files: 0 Skipped files: 0 Unknown diskIDs: 0
I0413 14:40:24.262197 27295 HdfsTable.java:555] Refreshed file metadata for 
functional_parquet.alltypes Path: 
hdfs://localhost:20500/test-warehouse/alltypes_parquet/year=2009/month=6: 
Loaded files: 1 Hidden files: 0 Skipped files: 0 Unknown diskIDs: 0
I0413 14:40:24.264463 27295 HdfsTable.java:555] Refreshed file metadata for 
functional_parquet.alltypes Path: 
hdfs://localhost:20500/test-warehouse/alltypes_parquet/year=2009/month=7: 
Loaded files: 1 Hidden files: 0 Skipped files: 0 Unknown diskIDs: 0
I0413 14:40:24.266736 27295 HdfsTable.java:555] Refreshed file metadata for 
functional_parquet.alltypes Path: 
hdfs://localhost:20500/test-warehouse/alltypes_parquet/year=2009/month=8: 
Loaded files: 1 Hidden files: 0 Skipped files: 0 Unknown diskIDs: 0
I0413 14:40:24.269210 27295 HdfsTable.java:555] Refreshed file metadata for 
functional_parquet.alltypes Path: 
hdfs://localhost:20500/test-warehouse/alltypes_parquet/year=2009/month=9: 
Loaded files: 1 Hidden files: 0 Skipped files: 0 Unknown diskIDs: 0
I0413 14:40:24.271800 27295 HdfsTable.java:555] Refreshed file metadata for 
functional_parquet.alltypes Path: 
hdfs://localhost:20500/test-warehouse/alltypes_parquet/year=2010/month=1: 
Loaded files: 1 Hidden files: 0 Skipped files: 0 Unknown diskIDs: 0
I0413 14:40:24.274348 27295 HdfsTable.java:555] Refreshed file metadata for 
functional_parquet.alltypes Path: 
hdfs://localhost:20500/test-warehouse/alltypes_parquet/year=2010/month=10: 
Loaded files: 1 Hidden files: 0 Skipped files: 0 Unknown diskIDs: 0
I0413 14:40:24.277053 27295 HdfsTable.java:555] Refreshed file metadata for 
functional_parquet.alltypes Path: 
hdfs://localhost:20500/test-warehouse/alltypes_parquet/year=2010/month=11: 
Loaded files: 1 Hidden files: 0 Skipped files: 0 Unknown diskIDs: 0
I0413 14:40:24.282152 27295 HdfsTable.java:555] Refreshed file metadata for 
functional_parquet.alltypes Path: 
hdfs://localhost:20500/test-warehouse/alltypes_parquet/year=2010/month=12: 
Loaded files: 1 Hidden files: 0 Skipped files: 0 Unknown diskIDs: 0
I0413 14:40:24.285684 27295 HdfsTable.java:555] Refreshed file metadata for 
functional_parquet.alltypes Path: 
hdfs://localhost:20500/test-warehouse/alltypes_parquet/year=2010/month=2: 
Loaded files: 1 Hidden files: 0 Skipped files: 0 Unknown diskIDs: 0
I0413 14:40:24.288921 27295 HdfsTable.java:555] Refreshed file metadata for 
functional_parquet.alltypes Path: 
hdfs://localhost:20500/test-warehouse/alltypes_parquet/year=2010/month=3: 
Loaded files: 1 Hidden files: 0 Skipped files: 0 Unknown diskIDs: 0
I0413 14:40:24.292757 27295 HdfsTable.java:555] Refreshed file metadata for 
functional_parquet.alltypes Path: 
hdfs://localhost:20500/test-warehouse/alltypes_parquet/year=2010/month=4: 
Loaded files: 1 Hidden files: 0 Skipped files: 0 Unknown diskIDs: 0
I0413 14:40:24.303673 27295 HdfsTable.java:555] Refreshed file metadata for 
functional_parquet.alltypes Path: 
hdfs://localhost:20500/test-warehouse/alltypes_parquet/year=2010/month=5: 
Loaded files: 1 Hidden files: 0 Skipped files: 0 Unknown diskIDs: 0
I0413 14:40:24.308387 27295 HdfsTable.java:555] Refreshed file metadata for 
functional_parquet.alltypes Path: 
hdfs://localhost:20500/test-warehouse/alltypes_parquet/year=2010/month=6: 
Loaded files: 1 Hidden files: 0 Skipped files: 0 Unknown diskIDs: 0
I0413 14:40:24.311506 27295 HdfsTable.java:555] Refreshed file metadata for 
functional_parquet.alltypes Path: 
hdfs://localhost:20500/test-warehouse/alltypes_parquet/year=2010/month=7: 
Loaded files: 1 Hidden files: 0 Skipped files: 0 Unknown diskIDs: 0
I0413 14:40:24.314600 27295 HdfsTable.java:555] Refreshed file metadata for 
functional_parquet.alltypes Path: 
hdfs://localhost:20500/test-warehouse/alltypes_parquet/year=2010/month=8: 
Loaded files: 1 Hidden files: 0 Skipped files: 0 Unknown diskIDs: 0
I0413 14:40:24.317709 27295 HdfsTable.java:555] Refreshed file metadata for 
functional_parquet.alltypes Path: 
hdfs://localhost:20500/test-warehouse/alltypes_parquet/year=2010/month=9: 
Loaded files: 1 Hidden files: 0 Skipped files: 0 Unknown diskIDs: 0
I0413 14:40:24.317873 27295 HdfsTable.java:1273] Incrementally loaded table 
metadata for: functional_parquet.alltypes
{code}

The relevant code starts from CatalogOpExecutor:
{code}
boolean reloadMetadata = true;
catalog_.getLock().writeLock().unlock();

if (tbl instanceof KuduTable && altersKuduTable(params.getAlter_type())) {
  alterKuduTable(params, response, (KuduTable) tbl, newCatalogVersion);
  return;
}
switch (params.getAlter_type()) {
...
        case UPDATE_STATS:
          Preconditions.checkState(params.isSetUpdate_stats_params());
          Reference<Long> numUpdatedColumns = new Reference<>(0L);
          alterTableUpdateStats(tbl, params.getUpdate_stats_params(),
              numUpdatedPartitions, numUpdatedColumns);
          reloadTableSchema = true;
          addSummary(response, "Updated " + numUpdatedPartitions.getRef() +
              " partition(s) and " + numUpdatedColumns.getRef() + " 
column(s).");
          break;
....
}

if (reloadMetadata) { <-- REFRESH here
  loadTableMetadata(tbl, newCatalogVersion, reloadFileMetadata,
      reloadTableSchema, null);
  addTableToCatalogUpdate(tbl, response.result);
}
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to