IMPALA-3562: support column restriction for compute stats

The 'compute stats' statement currently computes column-level
statistics for all columns of a table.
This adds potentially unneeded work for columns whose stats
are not needed by queries. It can be especially costly for
very wide tables and unneeded large string fields.

This change modifies the 'compute stats' (non-incremental only)
to support a user-specified list of columns for which stats
should be computed. An example with the extension is as follows:

compute stats my_db.my_table(column_a, column_b);

While the phrase "for columns ..." is commonly used, since
'compute stats' seems fairly unique (vs. 'analyze table ...'),
this change favors brevity with the parenthesized column list.

Whereas currently 'compute stats' is applied to the columns that
can be analyzed, the 'compute stats' in this change results in
an error when a column is specified that cannot be analyzed
(e.g., column does not exist, column is of an unsupported type,
column is a partitioning column). Moreover, an empty column
list can be supplied which means that no columns will be analyzed.

Testing:
  - analyzing a subset of columns is already supported (e.g., not all
    columns can be analyzed), so the focus with testing is to check
    that the user-specified columns are handled as expected.
  - tests include: parser tests, ddl analysis, end-to-end tests.

Change-Id: If8b25dd248e578dc7ddd35468125cca12d1b9f27
Reviewed-on: http://gerrit.cloudera.org:8080/9133
Reviewed-by: Alex Behm <alex.b...@cloudera.com>
Tested-by: Impala Public Jenkins


Project: http://git-wip-us.apache.org/repos/asf/impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/impala/commit/42c67fb5
Tree: http://git-wip-us.apache.org/repos/asf/impala/tree/42c67fb5
Diff: http://git-wip-us.apache.org/repos/asf/impala/diff/42c67fb5

Branch: refs/heads/2.x
Commit: 42c67fb592dbc81cce77c9e28bc82526c152b381
Parents: 0f910a8
Author: Vuk Ercegovac <vercego...@cloudera.com>
Authored: Wed Jan 24 12:23:32 2018 -0800
Committer: Impala Public Jenkins <impala-public-jenk...@gerrit.cloudera.org>
Committed: Fri Feb 2 01:10:16 2018 +0000

----------------------------------------------------------------------
 fe/src/main/cup/sql-parser.cup                  |   5 +-
 .../impala/analysis/ComputeStatsStmt.java       |  65 +++++++--
 .../java/org/apache/impala/catalog/Table.java   |   2 +-
 .../apache/impala/analysis/AnalyzeDDLTest.java  |  55 +++++++-
 .../org/apache/impala/analysis/ParserTest.java  |  37 +++--
 .../queries/QueryTest/compute-stats.test        | 134 +++++++++++++++++++
 .../custom_cluster/test_stats_extrapolation.py  |  44 ++++--
 7 files changed, 303 insertions(+), 39 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/impala/blob/42c67fb5/fe/src/main/cup/sql-parser.cup
----------------------------------------------------------------------
diff --git a/fe/src/main/cup/sql-parser.cup b/fe/src/main/cup/sql-parser.cup
index 668bb88..dc0199b 100644
--- a/fe/src/main/cup/sql-parser.cup
+++ b/fe/src/main/cup/sql-parser.cup
@@ -1798,7 +1798,10 @@ cascade_val ::=
 
 compute_stats_stmt ::=
   KW_COMPUTE KW_STATS table_name:table opt_tablesample:tblsmpl
-  {: RESULT = ComputeStatsStmt.createStatsStmt(table, tblsmpl); :}
+  {: RESULT = ComputeStatsStmt.createStatsStmt(table, tblsmpl, null); :}
+  | KW_COMPUTE KW_STATS table_name:table LPAREN opt_ident_list:cols RPAREN
+    opt_tablesample:tblsmpl
+  {: RESULT = ComputeStatsStmt.createStatsStmt(table, tblsmpl, cols); :}
   | KW_COMPUTE KW_INCREMENTAL KW_STATS table_name:table
   {: RESULT = ComputeStatsStmt.createIncrementalStatsStmt(table, null); :}
   | KW_COMPUTE KW_INCREMENTAL KW_STATS table_name:table partition_set:parts

http://git-wip-us.apache.org/repos/asf/impala/blob/42c67fb5/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
----------------------------------------------------------------------
diff --git a/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java 
b/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
index 4e61d86..6ca8dc9 100644
--- a/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
+++ b/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
@@ -22,6 +22,7 @@ import java.util.HashSet;
 import java.util.Iterator;
 import java.util.List;
 import java.util.Map;
+import java.util.Set;
 
 import org.apache.hadoop.hive.metastore.api.FieldSchema;
 import org.apache.impala.authorization.Privilege;
@@ -53,7 +54,7 @@ import com.google.common.collect.Sets;
  * clauses used (sampling, partition spec), as well as whether stats 
extrapolation
  * is enabled or not (--enable_stats_extrapolation).
  *
- * 1. COMPUTE STATS <table> [TABLESAMPLE SYSTEM(<perc>) [REPEATABLE(<seed>)]]
+ * 1. COMPUTE STATS <table> [(col_list)] [TABLESAMPLE SYSTEM(<perc>) 
[REPEATABLE(<seed>)]]
  * - Stats extrapolation enabled:
  *   Computes and replaces the table-level row count and total file size, as 
well as all
  *   table-level column statistics. Existing partition-objects and their row 
count are
@@ -71,6 +72,9 @@ import com.google.common.collect.Sets;
  *   partitions to set the extrapolated numRows statistic. Altering many 
partitions is
  *   expensive and so should be avoided in favor of enabling extrapolation.
  *
+ *   By default, statistics are computed for all columns. To control which 
columns are
+ *   analyzed, a whitelist of columns names can be optionally specified.
+ *
  * 2. COMPUTE INCREMENTAL STATS <table> [PARTITION <part_spec>]
  * - Stats extrapolation enabled:
  *   Not supported for now to keep the logic/code simple.
@@ -84,7 +88,7 @@ import com.google.common.collect.Sets;
  *   If a set of partitions is specified, then the incremental statistics for 
those
  *   partitions are recomputed (then merged into the table-level statistics).
  *
- * TODO: Allow more coarse/fine grained (db, column)
+ * TODO: Allow more coarse (db)
  * TODO: Compute stats on complex types.
  */
 public class ComputeStatsStmt extends StatementBase {
@@ -143,6 +147,15 @@ public class ComputeStatsStmt extends StatementBase {
   // null if this is a non-incremental computation.
   private PartitionSet partitionSet_ = null;
 
+  // If non-null, represents the user-specified list of columns for computing 
statistics.
+  // Not supported for incremental statistics.
+  private List<String> columnWhitelist_ = null;
+
+  // The set of columns to be analyzed. Each column is valid: it must exist in 
the table
+  // schema, it must be of a type that can be analyzed, and cannot refer to a 
partitioning
+  // column for HDFS tables. If the set is null, no columns are restricted.
+  private Set<Column> validatedColumnWhitelist_ = null;
+
   // The maximum number of partitions that may be explicitly selected by filter
   // predicates. Any query that selects more than this automatically drops 
back to a full
   // incremental stats recomputation.
@@ -154,15 +167,17 @@ public class ComputeStatsStmt extends StatementBase {
    * Should only be constructed via static creation functions.
    */
   private ComputeStatsStmt(TableName tableName, TableSampleClause sampleParams,
-      boolean isIncremental, PartitionSet partitionSet) {
+      boolean isIncremental, PartitionSet partitionSet, List<String> columns) {
     Preconditions.checkState(tableName != null && !tableName.isEmpty());
     Preconditions.checkState(isIncremental || partitionSet == null);
     Preconditions.checkState(!isIncremental || sampleParams == null);
+    Preconditions.checkState(!isIncremental || columns == null);
     tableName_ = tableName;
     sampleParams_ = sampleParams;
     table_ = null;
     isIncremental_ = isIncremental;
     partitionSet_ = partitionSet;
+    columnWhitelist_ = columns;
     if (partitionSet_ != null) {
       partitionSet_.setTableName(tableName);
       partitionSet_.setPrivilegeRequirement(Privilege.ALTER);
@@ -174,17 +189,17 @@ public class ComputeStatsStmt extends StatementBase {
    * stats should be computed with table sampling.
    */
   public static ComputeStatsStmt createStatsStmt(TableName tableName,
-      TableSampleClause sampleParams) {
-    return new ComputeStatsStmt(tableName, sampleParams, false, null);
+      TableSampleClause sampleParams, List<String> columns) {
+    return new ComputeStatsStmt(tableName, sampleParams, false, null, columns);
   }
 
   /**
-   * Returns a stmt for COMPUTE INCREMENTAL STATS. The optional 'partitioSet' 
specifies a
+   * Returns a stmt for COMPUTE INCREMENTAL STATS. The optional 'partitionSet' 
specifies a
    * set of partitions whose stats should be computed.
    */
   public static ComputeStatsStmt createIncrementalStatsStmt(TableName 
tableName,
       PartitionSet partitionSet) {
-    return new ComputeStatsStmt(tableName, null, true, partitionSet);
+    return new ComputeStatsStmt(tableName, null, true, partitionSet, null);
   }
 
   private List<String> getBaseColumnStatsQuerySelectList(Analyzer analyzer) {
@@ -196,6 +211,9 @@ public class ComputeStatsStmt extends StatementBase {
 
     for (int i = startColIdx; i < table_.getColumns().size(); ++i) {
       Column c = table_.getColumns().get(i);
+      if (validatedColumnWhitelist_ != null && 
!validatedColumnWhitelist_.contains(c)) {
+        continue;
+      }
       if (ignoreColumn(c)) continue;
 
       // NDV approximation function. Add explicit alias for later 
identification when
@@ -324,6 +342,26 @@ public class ComputeStatsStmt extends StatementBase {
       isIncremental_ = false;
     }
 
+    if (columnWhitelist_ != null) {
+      validatedColumnWhitelist_ = Sets.newHashSet();
+      for (String colName : columnWhitelist_) {
+        Column col = table_.getColumn(colName);
+        if (col == null) {
+          throw new AnalysisException(colName + " not found in table: " +
+              table_.getName());
+        }
+        if (table_ instanceof HdfsTable && table_.isClusteringColumn(col)) {
+          throw new AnalysisException("COMPUTE STATS not supported for 
partitioning " +
+              "column " + col.getName() + " of HDFS table.");
+        }
+        if (ignoreColumn(col)) {
+          throw new AnalysisException("COMPUTE STATS not supported for column 
" +
+              col.getName() + " of complex type:" + col.getType().toSql());
+        }
+        validatedColumnWhitelist_.add(col);
+      }
+    }
+
     HdfsTable hdfsTable = null;
     if (table_ instanceof HdfsTable) {
       hdfsTable = (HdfsTable)table_;
@@ -683,8 +721,13 @@ public class ComputeStatsStmt extends StatementBase {
   }
 
   public double getEffectiveSamplingPerc() { return effectiveSamplePerc_; }
+
+  /**
+   * For testing.
+   */
   public String getTblStatsQuery() { return tableStatsQueryStr_; }
   public String getColStatsQuery() { return columnStatsQueryStr_; }
+  public Set<Column> getValidatedColumnWhitelist() { return 
validatedColumnWhitelist_; }
 
   /**
    * Returns true if this statement computes stats on Parquet partitions only,
@@ -707,9 +750,15 @@ public class ComputeStatsStmt extends StatementBase {
   @Override
   public String toSql() {
     if (!isIncremental_) {
+      StringBuilder columnList = new StringBuilder();
+      if (columnWhitelist_ != null) {
+        columnList.append("(");
+        columnList.append(Joiner.on(", ").join(columnWhitelist_));
+        columnList.append(")");
+      }
       String tblsmpl = "";
       if (sampleParams_ != null) tblsmpl = " " + sampleParams_.toSql();
-      return "COMPUTE STATS " + tableName_.toSql() + tblsmpl;
+      return "COMPUTE STATS " + tableName_.toSql() + columnList.toString() + 
tblsmpl;
     } else {
       return "COMPUTE INCREMENTAL STATS " + tableName_.toSql() +
           partitionSet_ == null ? "" : partitionSet_.toSql();

http://git-wip-us.apache.org/repos/asf/impala/blob/42c67fb5/fe/src/main/java/org/apache/impala/catalog/Table.java
----------------------------------------------------------------------
diff --git a/fe/src/main/java/org/apache/impala/catalog/Table.java 
b/fe/src/main/java/org/apache/impala/catalog/Table.java
index a6536ba..aca9409 100644
--- a/fe/src/main/java/org/apache/impala/catalog/Table.java
+++ b/fe/src/main/java/org/apache/impala/catalog/Table.java
@@ -508,7 +508,7 @@ public abstract class Table extends CatalogObjectImpl {
   }
 
   /**
-   * Case-insensitive lookup.
+   * Case-insensitive lookup. Returns null if the column with 'name' is not 
found.
    */
   public Column getColumn(String name) { return 
colsByName_.get(name.toLowerCase()); }
 

http://git-wip-us.apache.org/repos/asf/impala/blob/42c67fb5/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
----------------------------------------------------------------------
diff --git a/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java 
b/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
index 122b49d..4124493 100644
--- a/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
+++ b/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
@@ -17,12 +17,14 @@
 
 package org.apache.impala.analysis;
 
+import static org.junit.Assert.assertEquals;
 import static org.junit.Assert.assertTrue;
 
 import java.io.IOException;
 import java.util.ArrayList;
 import java.util.Arrays;
 import java.util.List;
+import java.util.Set;
 import java.util.UUID;
 
 import org.apache.commons.lang3.StringUtils;
@@ -34,6 +36,7 @@ import org.apache.hadoop.fs.permission.FsPermission;
 import org.apache.impala.catalog.ArrayType;
 import org.apache.impala.catalog.Catalog;
 import org.apache.impala.catalog.CatalogException;
+import org.apache.impala.catalog.Column;
 import org.apache.impala.catalog.ColumnStats;
 import org.apache.impala.catalog.DataSource;
 import org.apache.impala.catalog.DataSourceTable;
@@ -60,6 +63,7 @@ import com.google.common.base.Joiner;
 import com.google.common.base.Preconditions;
 import com.google.common.collect.ImmutableList;
 import com.google.common.collect.Lists;
+import com.google.common.collect.Sets;
 
 public class AnalyzeDDLTest extends FrontendTestBase {
 
@@ -1180,16 +1184,37 @@ public class AnalyzeDDLTest extends FrontendTestBase {
     return checkComputeStatsStmt(stmt, analyzer, null);
   }
 
+  /**
+   * Analyzes 'stmt' and checks that the table-level and column-level SQL that 
is used
+   * to compute the stats is valid. Returns the analyzed statement.
+   */
   ComputeStatsStmt checkComputeStatsStmt(String stmt, Analyzer analyzer,
       String expectedWarning) throws AnalysisException {
     ParseNode parseNode = AnalyzesOk(stmt, analyzer, expectedWarning);
     assertTrue(parseNode instanceof ComputeStatsStmt);
     ComputeStatsStmt parsedStmt = (ComputeStatsStmt)parseNode;
     AnalyzesOk(parsedStmt.getTblStatsQuery());
-    AnalyzesOk(parsedStmt.getColStatsQuery());
+    String colsQuery = parsedStmt.getColStatsQuery();
+    if (colsQuery != null) AnalyzesOk(colsQuery);
     return parsedStmt;
   }
 
+  /**
+   * In addition to the validation for checkComputeStatsStmt(String), checks 
that the
+   * whitelisted columns match 'expColNames'.
+   */
+  void checkComputeStatsStmt(String stmt, List<String> expColNames)
+      throws AnalysisException {
+    ComputeStatsStmt parsedStmt = checkComputeStatsStmt(stmt);
+    Set<Column> actCols = parsedStmt.getValidatedColumnWhitelist();
+    if (expColNames == null) assertTrue("Expected no whitelist.", actCols == 
null);
+    assertTrue("Expected whitelist.", actCols != null);
+    Set<String> actColSet = Sets.newHashSet();
+    for (Column col: actCols) actColSet.add(col.getName());
+    Set<String> expColSet = Sets.newHashSet(expColNames);
+    assertEquals(actColSet, expColSet);
+  }
+
   @Test
   public void TestComputeStats() throws AnalysisException {
     // Analyze the stmt itself as well as the generated child queries.
@@ -1197,6 +1222,28 @@ public class AnalyzeDDLTest extends FrontendTestBase {
     checkComputeStatsStmt("compute stats functional_hbase.alltypes");
     // Test that complex-typed columns are ignored.
     checkComputeStatsStmt("compute stats functional.allcomplextypes");
+    // Test legal column restriction.
+    checkComputeStatsStmt("compute stats functional.alltypes (int_col, 
double_col)",
+        Lists.newArrayList("int_col", "double_col"));
+    // Test legal column restriction with duplicate columns specified.
+    checkComputeStatsStmt(
+        "compute stats functional.alltypes (int_col, double_col, int_col)",
+        Lists.newArrayList("int_col", "double_col"));
+    // Test empty column restriction.
+    checkComputeStatsStmt("compute stats functional.alltypes ()",
+        new ArrayList<String>());
+    // Test column restriction of a column that does not exist.
+    AnalysisError("compute stats functional.alltypes(int_col, bogus_col, 
double_col)",
+        "bogus_col not found in table:");
+    // Test column restriction of a column with an unsupported type.
+    AnalysisError("compute stats functional.allcomplextypes(id, map_map_col)",
+        "COMPUTE STATS not supported for column");
+    // Test column restriction of an Hdfs table partitioning column.
+    AnalysisError("compute stats functional.stringpartitionkey(string_col)",
+        "COMPUTE STATS not supported for partitioning");
+    // Test column restriction of an HBase key column.
+    checkComputeStatsStmt("compute stats functional_hbase.testtbl(id)",
+        Lists.newArrayList("id"));
 
     // Cannot compute stats on a database.
     AnalysisError("compute stats tbl_does_not_exist",
@@ -1283,7 +1330,11 @@ public class AnalyzeDDLTest extends FrontendTestBase {
       // changes. Expect a sample between 4 and 6 of the 24 total files.
       Assert.assertTrue(adjustedStmt.getEffectiveSamplingPerc() >= 4.0 / 24);
       Assert.assertTrue(adjustedStmt.getEffectiveSamplingPerc() <= 6.0 / 24);
-
+      // Checks that whitelisted columns works with tablesample.
+      checkComputeStatsStmt(
+          "compute stats functional.alltypes (int_col, double_col) tablesample 
" +
+          "system (55) repeatable(1)",
+          Lists.newArrayList("int_col", "double_col"));
       AnalysisError("compute stats functional.alltypes tablesample system 
(101)",
           "Invalid percent of bytes value '101'. " +
           "The percent of bytes to sample must be between 0 and 100.");

http://git-wip-us.apache.org/repos/asf/impala/blob/42c67fb5/fe/src/test/java/org/apache/impala/analysis/ParserTest.java
----------------------------------------------------------------------
diff --git a/fe/src/test/java/org/apache/impala/analysis/ParserTest.java 
b/fe/src/test/java/org/apache/impala/analysis/ParserTest.java
index 40585bf..8dd4898 100644
--- a/fe/src/test/java/org/apache/impala/analysis/ParserTest.java
+++ b/fe/src/test/java/org/apache/impala/analysis/ParserTest.java
@@ -3165,21 +3165,32 @@ public class ParserTest extends FrontendTestBase {
   @Test
   public void TestComputeDropStats() {
     String[] prefixes = {"compute", "drop"};
+    String[] okSuffixes = {"stats bar", "stats `bar`", "stats foo.bar",
+        "stats `foo`.`bar`"};
+    String[] okComputeSuffixes = {"(ab)", "(ab, bc)", "()"};
+    String[] errorSuffixes = {
+     // Missing table name.
+     "stats",
+     // Missing 'stats' keyword.
+     "`bar`",
+     // Cannot use string literal as table name.
+     "stats 'foo'",
+     // Cannot analyze multiple tables in one stmt.
+     "stats foo bar"
+    };
 
     for (String prefix: prefixes) {
-      ParsesOk(prefix + " stats bar");
-      ParsesOk(prefix + " stats `bar`");
-      ParsesOk(prefix + " stats foo.bar");
-      ParsesOk(prefix + " stats `foo`.`bar`");
-
-      // Missing table name.
-      ParserError(prefix + " stats");
-      // Missing 'stats' keyword.
-      ParserError(prefix + " foo");
-      // Cannot use string literal as table name.
-      ParserError(prefix + " stats 'foo'");
-      // Cannot analyze multiple tables in one stmt.
-      ParserError(prefix + " stats foo bar");
+      for (String suffix: okSuffixes) {
+        ParsesOk(prefix + " " + suffix);
+      }
+      for (String suffix: errorSuffixes) {
+        ParserError(prefix + " " + suffix);
+      }
+    }
+    for (String suffix: okSuffixes) {
+      for (String computeSuffix: okComputeSuffixes) {
+        ParsesOk("compute" + " " + suffix + " " + computeSuffix);
+      }
     }
   }
 

http://git-wip-us.apache.org/repos/asf/impala/blob/42c67fb5/testdata/workloads/functional-query/queries/QueryTest/compute-stats.test
----------------------------------------------------------------------
diff --git 
a/testdata/workloads/functional-query/queries/QueryTest/compute-stats.test 
b/testdata/workloads/functional-query/queries/QueryTest/compute-stats.test
index 92aa0db..b7494f0 100644
--- a/testdata/workloads/functional-query/queries/QueryTest/compute-stats.test
+++ b/testdata/workloads/functional-query/queries/QueryTest/compute-stats.test
@@ -208,6 +208,140 @@ COLUMN, TYPE, #DISTINCT VALUES, #NULLS, MAX SIZE, AVG SIZE
 STRING, STRING, BIGINT, BIGINT, BIGINT, DOUBLE
 ====
 ---- QUERY
+# Restricts stats to a subset of columns.
+create table alltypes_for_coltest like functional.alltypes;
+insert into alltypes_for_coltest partition(year, month)
+select * from functional.alltypes;
+====
+---- QUERY
+compute stats alltypes_for_coltest(tinyint_col, float_col)
+---- RESULTS
+'Updated 24 partition(s) and 2 column(s).'
+---- TYPES
+STRING
+====
+---- QUERY
+show table stats alltypes_for_coltest
+---- LABELS
+YEAR, MONTH, #ROWS, #FILES, SIZE, BYTES CACHED, CACHE REPLICATION, FORMAT, 
INCREMENTAL STATS, LOCATION
+---- RESULTS
+'2009','1',310,1,'24.56KB','NOT CACHED','NOT CACHED','TEXT','false',regex:.*
+'2009','2',280,1,'22.27KB','NOT CACHED','NOT CACHED','TEXT','false',regex:.*
+'2009','3',310,1,'24.67KB','NOT CACHED','NOT CACHED','TEXT','false',regex:.*
+'2009','4',300,1,'24.06KB','NOT CACHED','NOT CACHED','TEXT','false',regex:.*
+'2009','5',310,1,'24.97KB','NOT CACHED','NOT CACHED','TEXT','false',regex:.*
+'2009','6',300,1,'24.16KB','NOT CACHED','NOT CACHED','TEXT','false',regex:.*
+'2009','7',310,1,'24.97KB','NOT CACHED','NOT CACHED','TEXT','false',regex:.*
+'2009','8',310,1,'24.97KB','NOT CACHED','NOT CACHED','TEXT','false',regex:.*
+'2009','9',300,1,'24.16KB','NOT CACHED','NOT CACHED','TEXT','false',regex:.*
+'2009','10',310,1,'24.97KB','NOT CACHED','NOT CACHED','TEXT','false',regex:.*
+'2009','11',300,1,'24.16KB','NOT CACHED','NOT CACHED','TEXT','false',regex:.*
+'2009','12',310,1,'24.97KB','NOT CACHED','NOT CACHED','TEXT','false',regex:.*
+'2010','1',310,1,'24.97KB','NOT CACHED','NOT CACHED','TEXT','false',regex:.*
+'2010','2',280,1,'22.54KB','NOT CACHED','NOT CACHED','TEXT','false',regex:.*
+'2010','3',310,1,'24.97KB','NOT CACHED','NOT CACHED','TEXT','false',regex:.*
+'2010','4',300,1,'24.16KB','NOT CACHED','NOT CACHED','TEXT','false',regex:.*
+'2010','5',310,1,'24.97KB','NOT CACHED','NOT CACHED','TEXT','false',regex:.*
+'2010','6',300,1,'24.16KB','NOT CACHED','NOT CACHED','TEXT','false',regex:.*
+'2010','7',310,1,'24.97KB','NOT CACHED','NOT CACHED','TEXT','false',regex:.*
+'2010','8',310,1,'24.97KB','NOT CACHED','NOT CACHED','TEXT','false',regex:.*
+'2010','9',300,1,'24.16KB','NOT CACHED','NOT CACHED','TEXT','false',regex:.*
+'2010','10',310,1,'24.97KB','NOT CACHED','NOT CACHED','TEXT','false',regex:.*
+'2010','11',300,1,'24.16KB','NOT CACHED','NOT CACHED','TEXT','false',regex:.*
+'2010','12',310,1,'24.97KB','NOT CACHED','NOT CACHED','TEXT','false',regex:.*
+'Total','',7300,24,'586.84KB','0B','','','',''
+---- TYPES
+STRING, STRING, BIGINT, BIGINT, STRING, STRING, STRING, STRING, STRING, STRING
+====
+---- QUERY
+show column stats alltypes_for_coltest
+---- LABELS
+COLUMN, TYPE, #DISTINCT VALUES, #NULLS, MAX SIZE, AVG SIZE
+---- RESULTS
+'id','INT',-1,-1,4,4
+'bool_col','BOOLEAN',-1,-1,1,1
+'tinyint_col','TINYINT',10,-1,1,1
+'smallint_col','SMALLINT',-1,-1,2,2
+'int_col','INT',-1,-1,4,4
+'bigint_col','BIGINT',-1,-1,8,8
+'float_col','FLOAT',10,-1,4,4
+'double_col','DOUBLE',-1,-1,8,8
+'date_string_col','STRING',-1,-1,-1,-1
+'string_col','STRING',-1,-1,-1,-1
+'timestamp_col','TIMESTAMP',-1,-1,16,16
+'year','INT',2,0,4,4
+'month','INT',12,0,4,4
+---- TYPES
+STRING, STRING, BIGINT, BIGINT, BIGINT, DOUBLE
+====
+---- QUERY
+# Computes only table statistics; no column statistics.
+create table alltypes_no_col_stats like functional.alltypes;
+insert into alltypes_no_col_stats partition(year, month)
+select * from functional.alltypes;
+====
+---- QUERY
+compute stats alltypes_no_col_stats()
+---- RESULTS
+'Updated 24 partition(s) and 0 column(s).'
+---- TYPES
+STRING
+====
+---- QUERY
+show table stats alltypes_no_col_stats
+---- LABELS
+YEAR, MONTH, #ROWS, #FILES, SIZE, BYTES CACHED, CACHE REPLICATION, FORMAT, 
INCREMENTAL STATS, LOCATION
+---- RESULTS
+'2009','1',310,1,'24.56KB','NOT CACHED','NOT CACHED','TEXT','false',regex:.*
+'2009','2',280,1,'22.27KB','NOT CACHED','NOT CACHED','TEXT','false',regex:.*
+'2009','3',310,1,'24.67KB','NOT CACHED','NOT CACHED','TEXT','false',regex:.*
+'2009','4',300,1,'24.06KB','NOT CACHED','NOT CACHED','TEXT','false',regex:.*
+'2009','5',310,1,'24.97KB','NOT CACHED','NOT CACHED','TEXT','false',regex:.*
+'2009','6',300,1,'24.16KB','NOT CACHED','NOT CACHED','TEXT','false',regex:.*
+'2009','7',310,1,'24.97KB','NOT CACHED','NOT CACHED','TEXT','false',regex:.*
+'2009','8',310,1,'24.97KB','NOT CACHED','NOT CACHED','TEXT','false',regex:.*
+'2009','9',300,1,'24.16KB','NOT CACHED','NOT CACHED','TEXT','false',regex:.*
+'2009','10',310,1,'24.97KB','NOT CACHED','NOT CACHED','TEXT','false',regex:.*
+'2009','11',300,1,'24.16KB','NOT CACHED','NOT CACHED','TEXT','false',regex:.*
+'2009','12',310,1,'24.97KB','NOT CACHED','NOT CACHED','TEXT','false',regex:.*
+'2010','1',310,1,'24.97KB','NOT CACHED','NOT CACHED','TEXT','false',regex:.*
+'2010','2',280,1,'22.54KB','NOT CACHED','NOT CACHED','TEXT','false',regex:.*
+'2010','3',310,1,'24.97KB','NOT CACHED','NOT CACHED','TEXT','false',regex:.*
+'2010','4',300,1,'24.16KB','NOT CACHED','NOT CACHED','TEXT','false',regex:.*
+'2010','5',310,1,'24.97KB','NOT CACHED','NOT CACHED','TEXT','false',regex:.*
+'2010','6',300,1,'24.16KB','NOT CACHED','NOT CACHED','TEXT','false',regex:.*
+'2010','7',310,1,'24.97KB','NOT CACHED','NOT CACHED','TEXT','false',regex:.*
+'2010','8',310,1,'24.97KB','NOT CACHED','NOT CACHED','TEXT','false',regex:.*
+'2010','9',300,1,'24.16KB','NOT CACHED','NOT CACHED','TEXT','false',regex:.*
+'2010','10',310,1,'24.97KB','NOT CACHED','NOT CACHED','TEXT','false',regex:.*
+'2010','11',300,1,'24.16KB','NOT CACHED','NOT CACHED','TEXT','false',regex:.*
+'2010','12',310,1,'24.97KB','NOT CACHED','NOT CACHED','TEXT','false',regex:.*
+'Total','',7300,24,'586.84KB','0B','','','',''
+---- TYPES
+STRING, STRING, BIGINT, BIGINT, STRING, STRING, STRING, STRING, STRING, STRING
+====
+---- QUERY
+show column stats alltypes_no_col_stats
+---- LABELS
+COLUMN, TYPE, #DISTINCT VALUES, #NULLS, MAX SIZE, AVG SIZE
+---- RESULTS
+'id','INT',-1,-1,4,4
+'bool_col','BOOLEAN',-1,-1,1,1
+'tinyint_col','TINYINT',-1,-1,1,1
+'smallint_col','SMALLINT',-1,-1,2,2
+'int_col','INT',-1,-1,4,4
+'bigint_col','BIGINT',-1,-1,8,8
+'float_col','FLOAT',-1,-1,4,4
+'double_col','DOUBLE',-1,-1,8,8
+'date_string_col','STRING',-1,-1,-1,-1
+'string_col','STRING',-1,-1,-1,-1
+'timestamp_col','TIMESTAMP',-1,-1,16,16
+'year','INT',2,0,4,4
+'month','INT',12,0,4,4
+---- TYPES
+STRING, STRING, BIGINT, BIGINT, BIGINT, DOUBLE
+====
+---- QUERY
 # Add partitions with NULL values and check for stats.
 alter table alltypes add partition (year=NULL, month=NULL)
 ---- RESULTS

http://git-wip-us.apache.org/repos/asf/impala/blob/42c67fb5/tests/custom_cluster/test_stats_extrapolation.py
----------------------------------------------------------------------
diff --git a/tests/custom_cluster/test_stats_extrapolation.py 
b/tests/custom_cluster/test_stats_extrapolation.py
index ef0b675..65aa9f1 100644
--- a/tests/custom_cluster/test_stats_extrapolation.py
+++ b/tests/custom_cluster/test_stats_extrapolation.py
@@ -57,10 +57,10 @@ class TestStatsExtrapolation(CustomClusterTestSuite):
     # Test partitioned table.
     part_test_tbl = unique_database + ".alltypes"
     self.clone_table("functional.alltypes", part_test_tbl, True, vector)
-    self.__run_sampling_test(part_test_tbl, "functional.alltypes", 1, 3)
-    self.__run_sampling_test(part_test_tbl, "functional.alltypes", 10, 7)
-    self.__run_sampling_test(part_test_tbl, "functional.alltypes", 20, 13)
-    self.__run_sampling_test(part_test_tbl, "functional.alltypes", 100, 99)
+    self.__run_sampling_test(part_test_tbl, "", "functional.alltypes", 1, 3)
+    self.__run_sampling_test(part_test_tbl, "", "functional.alltypes", 10, 7)
+    self.__run_sampling_test(part_test_tbl, "", "functional.alltypes", 20, 13)
+    self.__run_sampling_test(part_test_tbl, "", "functional.alltypes", 100, 99)
 
     # Test unpartitioned table.
     nopart_test_tbl = unique_database + ".alltypesnopart"
@@ -70,15 +70,15 @@ class TestStatsExtrapolation(CustomClusterTestSuite):
     nopart_test_tbl_exp = unique_database + ".alltypesnopart_exp"
     self.clone_table(nopart_test_tbl, nopart_test_tbl_exp, False, vector)
     self.client.execute("compute stats {0}".format(nopart_test_tbl_exp))
-    self.__run_sampling_test(nopart_test_tbl, nopart_test_tbl_exp, 1, 3)
-    self.__run_sampling_test(nopart_test_tbl, nopart_test_tbl_exp, 10, 7)
-    self.__run_sampling_test(nopart_test_tbl, nopart_test_tbl_exp, 20, 13)
-    self.__run_sampling_test(nopart_test_tbl, nopart_test_tbl_exp, 100, 99)
+    self.__run_sampling_test(nopart_test_tbl, "", nopart_test_tbl_exp, 1, 3)
+    self.__run_sampling_test(nopart_test_tbl, "", nopart_test_tbl_exp, 10, 7)
+    self.__run_sampling_test(nopart_test_tbl, "", nopart_test_tbl_exp, 20, 13)
+    self.__run_sampling_test(nopart_test_tbl, "", nopart_test_tbl_exp, 100, 99)
 
     # Test empty table.
     empty_test_tbl = unique_database + ".empty"
     self.clone_table("functional.alltypes", empty_test_tbl, False, vector)
-    self.__run_sampling_test(empty_test_tbl, empty_test_tbl, 10, 7)
+    self.__run_sampling_test(empty_test_tbl, "", empty_test_tbl, 10, 7)
 
     # Test wide table. Should not crash or error. This takes a few minutes so 
restrict
     # to exhaustive.
@@ -88,13 +88,29 @@ class TestStatsExtrapolation(CustomClusterTestSuite):
       self.client.execute(
         "compute stats {0} tablesample system(10)".format(wide_test_tbl))
 
-  def __run_sampling_test(self, tbl, expected_tbl, perc, seed):
+    # Test column subset.
+    column_subset_tbl = unique_database + ".column_subset"
+    columns = "(int_col, string_col)"
+    self.clone_table("functional.alltypes", column_subset_tbl, True, vector)
+    self.__run_sampling_test(column_subset_tbl, columns, 
"functional.alltypes", 1, 3)
+    self.__run_sampling_test(column_subset_tbl, columns, 
"functional.alltypes", 10, 7)
+    self.__run_sampling_test(column_subset_tbl, columns, 
"functional.alltypes", 20, 13)
+    self.__run_sampling_test(column_subset_tbl, columns, 
"functional.alltypes", 100, 99)
+
+    # Test no columns.
+    no_column_tbl = unique_database + ".no_columns"
+    columns = "()"
+    self.clone_table("functional.alltypes", no_column_tbl, True, vector)
+    self.__run_sampling_test(no_column_tbl, columns, "functional.alltypes", 
10, 7)
+
+  def __run_sampling_test(self, tbl, cols, expected_tbl, perc, seed):
     """Drops stats on 'tbl' and then runs COMPUTE STATS TABLESAMPLE on 'tbl' 
with the
-    given sampling percent and random seed. Checks that the resulting table 
and column
-    stats are reasoanbly close to those of 'expected_tbl'."""
+    given column restriction clause, sampling percent and random seed. Checks 
that
+    the resulting table and column stats are reasoanbly close to those of
+    'expected_tbl'."""
     self.client.execute("drop stats {0}".format(tbl))
-    self.client.execute("compute stats {0} tablesample system ({1}) repeatable 
({2})"\
-      .format(tbl, perc, seed))
+    self.client.execute("compute stats {0}{1} tablesample system ({2}) 
repeatable ({3})"\
+      .format(tbl, cols, perc, seed))
     self.__check_table_stats(tbl, expected_tbl)
     self.__check_column_stats(tbl, expected_tbl)
 

Reply via email to