[2/4] impala git commit: IMPALA-6464: [DOCS] COMPUTE STATS supports a list of columns

joemcdonnell Tue, 17 Apr 2018 13:26:13 -0700

IMPALA-6464: [DOCS] COMPUTE STATS supports a list of columns

Change-Id: I609c38eac29e36eca008bfb66f5e78f5491e719a
Reviewed-on: http://gerrit.cloudera.org:8080/10070
Reviewed-by: Vuk Ercegovac <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>



Project: http://git-wip-us.apache.org/repos/asf/impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/impala/commit/0e98b9ab
Tree: http://git-wip-us.apache.org/repos/asf/impala/tree/0e98b9ab
Diff: http://git-wip-us.apache.org/repos/asf/impala/diff/0e98b9ab

Branch: refs/heads/master
Commit: 0e98b9abd05ccfb3f01657434f913ad7d061f087
Parents: a6767de
Author: Alex Rodoni <[email protected]>
Authored: Fri Apr 13 18:14:57 2018 -0700
Committer: Impala Public Jenkins <[email protected]>
Committed: Mon Apr 16 20:28:34 2018 +0000

----------------------------------------------------------------------
 docs/topics/impala_compute_stats.xml | 116 ++++++++++++++++++++----------
 1 file changed, 77 insertions(+), 39 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/impala/blob/0e98b9ab/docs/topics/impala_compute_stats.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_compute_stats.xml 
b/docs/topics/impala_compute_stats.xml
index 98694f8..b62972c 100644
--- a/docs/topics/impala_compute_stats.xml
+++ b/docs/topics/impala_compute_stats.xml
@@ -49,7 +49,11 @@ under the License.
 
     <p conref="../shared/impala_common.xml#common/syntax_blurb"/>
 
-<codeblock rev="2.1.0">COMPUTE STATS 
[<varname>db_name</varname>.]<varname>table_name</varname>
+<codeblock rev="impala-3562">COMPUTE STATS
+  [<varname>db_name</varname>.]<varname>table_name</varname> [ ( 
<varname>column_list</varname> ) ]
+
+<varname>column_list</varname> ::= <varname>column_name</varname> [ , 
<varname>column_name</varname>, ... ]
+
 COMPUTE INCREMENTAL STATS 
[<varname>db_name</varname>.]<varname>table_name</varname> [PARTITION 
(<varname>partition_spec</varname>)]
 
 <varname>partition_spec</varname> ::= <varname>simple_partition_spec</varname> 
| <ph rev="IMPALA-1654"><varname>complex_partition_spec</varname></ph>
@@ -64,12 +68,40 @@ COMPUTE INCREMENTAL STATS 
[<varname>db_name</varname>.]<varname>table_name</varn
     <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/>
 
     <p>
-      Originally, Impala relied on users to run the Hive <codeph>ANALYZE 
TABLE</codeph> statement, but that method
-      of gathering statistics proved unreliable and difficult to use. The 
Impala <codeph>COMPUTE STATS</codeph>
-      statement is built from the ground up to improve the reliability and 
user-friendliness of this operation.
-      <codeph>COMPUTE STATS</codeph> does not require any setup steps or 
special configuration. You only run a
-      single Impala <codeph>COMPUTE STATS</codeph> statement to gather both 
table and column statistics, rather
-      than separate Hive <codeph>ANALYZE TABLE</codeph> statements for each 
kind of statistics.
+      Originally, Impala relied on users to run the Hive <codeph>ANALYZE
+        TABLE</codeph> statement, but that method of gathering statistics 
proved
+      unreliable and difficult to use. The Impala <codeph>COMPUTE 
STATS</codeph>
+      statement was built to improve the reliability and user-friendliness of
+      this operation. <codeph>COMPUTE STATS</codeph> does not require any setup
+      steps or special configuration. You only run a single Impala
+        <codeph>COMPUTE STATS</codeph> statement to gather both table and 
column
+      statistics, rather than separate Hive <codeph>ANALYZE TABLE</codeph>
+      statements for each kind of statistics.
+    </p>
+
+    <p rev="impala-3562">
+      For non-incremental <codeph>COMPUTE STATS</codeph>
+      statement, the columns for which statistics are computed can be specified
+      with an optional comma-separate list of columns.
+    </p>
+
+    <p rev="impala-3562">
+      If no column list is given, the <codeph>COMPUTE STATS</codeph> statement
+      computes column-level statistics for all columns of the table. This adds
+      potentially unneeded work for columns whose stats are not needed by
+      queries. It can be especially costly for very wide tables and unneeded
+      large string fields.
+    </p>
+    <p rev="impala-3562">
+      <codeph>COMPUTE STATS</codeph> returns an error when a specified column
+      cannot be analyzed, such as when the column does not exist, the column is
+      of an unsupported type for COMPUTE STATS, e.g. colums of complex types,
+      or the column is a partitioning column.
+
+    </p>
+    <p rev="impala-3562">
+      If an empty column list is given, no column is analyzed by 
<codeph>COMPUTE
+        STATS</codeph>.
     </p>
 
     <p rev="2.1.0">
@@ -92,39 +124,45 @@ COMPUTE INCREMENTAL STATS 
[<varname>db_name</varname>.]<varname>table_name</varn
       <codeph>COMPUTE STATS</codeph> statement. Such tables display 
<codeph>false</codeph> under the
       <codeph>Incremental stats</codeph> column of the <codeph>SHOW TABLE 
STATS</codeph> output.
     </p>
-
     <note>
-      Because many of the most performance-critical and resource-intensive 
operations rely on table and column
-      statistics to construct accurate and efficient plans, <codeph>COMPUTE 
STATS</codeph> is an important step at
-      the end of your ETL process. Run <codeph>COMPUTE STATS</codeph> on all 
tables as your first step during
-      performance tuning for slow queries, or troubleshooting for 
out-of-memory conditions:
-      <ul>
-        <li>
-          Accurate statistics help Impala construct an efficient query plan 
for join queries, improving performance
-          and reducing memory usage.
-        </li>
-
-        <li>
-          Accurate statistics help Impala distribute the work effectively for 
insert operations into Parquet
-          tables, improving performance and reducing memory usage.
-        </li>
-
-        <li rev="1.3.0">
-          Accurate statistics help Impala estimate the memory required for 
each query, which is important when you
-          use resource management features, such as admission control and the 
YARN resource management framework.
-          The statistics help Impala to achieve high concurrency, full 
utilization of available memory, and avoid
-          contention with workloads from other Hadoop components.
-        </li>
-        <li rev="IMPALA-4572">
-          In <keyword keyref="impala28_full"/> and higher, when you run the
-          <codeph>COMPUTE STATS</codeph> or <codeph>COMPUTE INCREMENTAL 
STATS</codeph>
-          statement against a Parquet table, Impala automatically applies the 
query
-          option setting <codeph>MT_DOP=4</codeph> to increase the amount of 
intra-node
-          parallelism during this CPU-intensive operation. See <xref 
keyref="mt_dop"/>
-          for details about what this query option does and how to use it with
-          CPU-intensive <codeph>SELECT</codeph> statements.
-        </li>
-      </ul>
+      <p>
+        Because many of the most performance-critical and resource-intensive
+        operations rely on table and column statistics to construct accurate 
and
+        efficient plans, <codeph>COMPUTE STATS</codeph> is an important step at
+        the end of your ETL process. Run <codeph>COMPUTE STATS</codeph> on all
+        tables as your first step during performance tuning for slow queries, 
or
+        troubleshooting for out-of-memory conditions:
+        <ul>
+          <li>
+            Accurate statistics help Impala construct an efficient query plan
+            for join queries, improving performance and reducing memory usage.
+          </li>
+          <li>
+            Accurate statistics help Impala distribute the work effectively
+            for insert operations into Parquet tables, improving performance 
and
+            reducing memory usage.
+          </li>
+          <li rev="1.3.0">
+            Accurate statistics help Impala estimate the memory
+            required for each query, which is important when you use resource
+            management features, such as admission control and the YARN 
resource
+            management framework. The statistics help Impala to achieve high
+            concurrency, full utilization of available memory, and avoid
+            contention with workloads from other Hadoop components.
+          </li>
+          <li rev="IMPALA-4572">
+            In <keyword keyref="impala28_full"/> and
+            higher, when you run the <codeph>COMPUTE STATS</codeph> or
+              <codeph>COMPUTE INCREMENTAL STATS</codeph> statement against a
+            Parquet table, Impala automatically applies the query option 
setting
+              <codeph>MT_DOP=4</codeph> to increase the amount of intra-node
+            parallelism during this CPU-intensive operation. See <xref
+              keyref="mt_dop"/> for details about what this query option does
+            and how to use it with CPU-intensive <codeph>SELECT</codeph>
+            statements.
+          </li>
+        </ul>
+      </p>
     </note>
 
     <p rev="IMPALA-1654">

[2/4] impala git commit: IMPALA-6464: [DOCS] COMPUTE STATS supports a list of columns

Reply via email to