Repository: incubator-impala
Updated Branches:
  refs/heads/master 36cd610d6 -> e278ed228


[DOCS] Tighten up advice about first COMPUTE INCREMENTAL STATS

Explain how doing COMPUTE INCREMENTAL STATS for the first time
starts over and discards any previous stats from COMPUTE STATS.

As a consequence, moved some wording and examples into
impala_common.xml so that content could be used in
multiple places. Also made a new subtopic on the "Partitioning"
page because I saw COMPUTE INCREMENTAL STATS wasn't mentioned
there.

Change-Id: Ia53a6518ce5541e5c9a2cd896856ce042a599b03
Reviewed-on: http://gerrit.cloudera.org:8080/7999
Reviewed-by: Alex Behm <[email protected]>
Tested-by: Impala Public Jenkins


Project: http://git-wip-us.apache.org/repos/asf/incubator-impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-impala/commit/e278ed22
Tree: http://git-wip-us.apache.org/repos/asf/incubator-impala/tree/e278ed22
Diff: http://git-wip-us.apache.org/repos/asf/incubator-impala/diff/e278ed22

Branch: refs/heads/master
Commit: e278ed228b9e15bcf2ba89dab6b002eb8d71f892
Parents: 36cd610
Author: John Russell <[email protected]>
Authored: Fri Sep 1 15:15:30 2017 -0700
Committer: Impala Public Jenkins <[email protected]>
Committed: Fri Oct 6 23:33:15 2017 +0000

----------------------------------------------------------------------
 docs/shared/impala_common.xml        | 127 ++++++++++++++++++++++++++++++
 docs/topics/impala_compute_stats.xml | 105 ++----------------------
 docs/topics/impala_partitioning.xml  |  29 +++++++
 docs/topics/impala_perf_stats.xml    |  21 +++--
 4 files changed, 172 insertions(+), 110 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/e278ed22/docs/shared/impala_common.xml
----------------------------------------------------------------------
diff --git a/docs/shared/impala_common.xml b/docs/shared/impala_common.xml
index f31bdf0..18a93de 100644
--- a/docs/shared/impala_common.xml
+++ b/docs/shared/impala_common.xml
@@ -1337,6 +1337,31 @@ drop database temp;
         other administrative contexts. See <xref keyref="sg_redaction"/> for 
details.
       </p>
 
+      <p id="cs_or_cis">
+        For a particular table, use either <codeph>COMPUTE STATS</codeph> or
+        <codeph>COMPUTE INCREMENTAL STATS</codeph>, but never combine the two 
or alternate
+        between them. If you switch from <codeph>COMPUTE STATS</codeph> to
+        <codeph>COMPUTE INCREMENTAL STATS</codeph> during the lifetime of a 
table, or vice
+        versa, drop all statistics (by running both <codeph>DROP 
STATS</codeph> and
+        <codeph>DROP INCREMENTAL STATS</codeph>) before making the switch.
+      </p>
+
+      <p id="incremental_stats_after_full">
+        When you run <codeph>COMPUTE INCREMENTAL STATS</codeph> on a table for 
the first time,
+        the statistics are computed again from scratch regardless of whether 
the table already
+        has statistics. Therefore, expect a one-time resource-intensive 
operation
+        for scanning the entire table when running <codeph>COMPUTE INCREMENTAL 
STATS</codeph>
+        for the first time on a given table.
+      </p>
+
+      <p id="incremental_stats_caveats">
+        For a table with a huge number of partitions and many columns, the 
approximately 400 bytes
+        of metadata per column per partition can add up to significant memory 
overhead, as it must
+        be cached on the <cmdname>catalogd</cmdname> host and on every 
<cmdname>impalad</cmdname> host
+        that is eligible to be a coordinator. If this metadata for all tables 
combined exceeds 2 GB,
+        you might experience service downtime.
+      </p>
+
       <p id="incremental_partition_spec">
         The <codeph>PARTITION</codeph> clause is only allowed in combination 
with the <codeph>INCREMENTAL</codeph>
         clause. It is optional for <codeph>COMPUTE INCREMENTAL STATS</codeph>, 
and required for <codeph>DROP
@@ -1346,6 +1371,108 @@ drop database temp;
         specification, and specify constant values for all the partition key 
columns.
       </p>
 
+<codeblock id="compute_stats_walkthrough">-- Initially the table has no 
incremental stats, as indicated
+-- 'false' under Incremental stats.
+show table stats item_partitioned;
++-------------+-------+--------+----------+--------------+---------+------------------
+| i_category  | #Rows | #Files | Size     | Bytes Cached | Format  | 
Incremental stats
++-------------+-------+--------+----------+--------------+---------+------------------
+| Books       | -1    | 1      | 223.74KB | NOT CACHED   | PARQUET | false
+| Children    | -1    | 1      | 230.05KB | NOT CACHED   | PARQUET | false
+| Electronics | -1    | 1      | 232.67KB | NOT CACHED   | PARQUET | false
+| Home        | -1    | 1      | 232.56KB | NOT CACHED   | PARQUET | false
+| Jewelry     | -1    | 1      | 223.72KB | NOT CACHED   | PARQUET | false
+| Men         | -1    | 1      | 231.25KB | NOT CACHED   | PARQUET | false
+| Music       | -1    | 1      | 237.90KB | NOT CACHED   | PARQUET | false
+| Shoes       | -1    | 1      | 234.90KB | NOT CACHED   | PARQUET | false
+| Sports      | -1    | 1      | 227.97KB | NOT CACHED   | PARQUET | false
+| Women       | -1    | 1      | 226.27KB | NOT CACHED   | PARQUET | false
+| Total       | -1    | 10     | 2.25MB   | 0B           |         |
++-------------+-------+--------+----------+--------------+---------+------------------
+
+-- After the first COMPUTE INCREMENTAL STATS,
+-- all partitions have stats. The first
+-- COMPUTE INCREMENTAL STATS scans the whole
+-- table, discarding any previous stats from
+-- a traditional COMPUTE STATS statement.
+compute incremental stats item_partitioned;
++-------------------------------------------+
+| summary                                   |
++-------------------------------------------+
+| Updated 10 partition(s) and 21 column(s). |
++-------------------------------------------+
+show table stats item_partitioned;
++-------------+-------+--------+----------+--------------+---------+------------------
+| i_category  | #Rows | #Files | Size     | Bytes Cached | Format  | 
Incremental stats
++-------------+-------+--------+----------+--------------+---------+------------------
+| Books       | 1733  | 1      | 223.74KB | NOT CACHED   | PARQUET | true
+| Children    | 1786  | 1      | 230.05KB | NOT CACHED   | PARQUET | true
+| Electronics | 1812  | 1      | 232.67KB | NOT CACHED   | PARQUET | true
+| Home        | 1807  | 1      | 232.56KB | NOT CACHED   | PARQUET | true
+| Jewelry     | 1740  | 1      | 223.72KB | NOT CACHED   | PARQUET | true
+| Men         | 1811  | 1      | 231.25KB | NOT CACHED   | PARQUET | true
+| Music       | 1860  | 1      | 237.90KB | NOT CACHED   | PARQUET | true
+| Shoes       | 1835  | 1      | 234.90KB | NOT CACHED   | PARQUET | true
+| Sports      | 1783  | 1      | 227.97KB | NOT CACHED   | PARQUET | true
+| Women       | 1790  | 1      | 226.27KB | NOT CACHED   | PARQUET | true
+| Total       | 17957 | 10     | 2.25MB   | 0B           |         |
++-------------+-------+--------+----------+--------------+---------+------------------
+
+-- Add a new partition...
+alter table item_partitioned add partition (i_category='Camping');
+-- Add or replace files in HDFS outside of Impala,
+-- rendering the stats for a partition obsolete.
+!import_data_into_sports_partition.sh
+refresh item_partitioned;
+drop incremental stats item_partitioned partition (i_category='Sports');
+-- Now some partitions have incremental stats
+-- and some do not.
+show table stats item_partitioned;
++-------------+-------+--------+----------+--------------+---------+------------------
+| i_category  | #Rows | #Files | Size     | Bytes Cached | Format  | 
Incremental stats
++-------------+-------+--------+----------+--------------+---------+------------------
+| Books       | 1733  | 1      | 223.74KB | NOT CACHED   | PARQUET | true
+| Camping     | -1    | 1      | 408.02KB | NOT CACHED   | PARQUET | false
+| Children    | 1786  | 1      | 230.05KB | NOT CACHED   | PARQUET | true
+| Electronics | 1812  | 1      | 232.67KB | NOT CACHED   | PARQUET | true
+| Home        | 1807  | 1      | 232.56KB | NOT CACHED   | PARQUET | true
+| Jewelry     | 1740  | 1      | 223.72KB | NOT CACHED   | PARQUET | true
+| Men         | 1811  | 1      | 231.25KB | NOT CACHED   | PARQUET | true
+| Music       | 1860  | 1      | 237.90KB | NOT CACHED   | PARQUET | true
+| Shoes       | 1835  | 1      | 234.90KB | NOT CACHED   | PARQUET | true
+| Sports      | -1    | 1      | 227.97KB | NOT CACHED   | PARQUET | false
+| Women       | 1790  | 1      | 226.27KB | NOT CACHED   | PARQUET | true
+| Total       | 17957 | 11     | 2.65MB   | 0B           |         |
++-------------+-------+--------+----------+--------------+---------+------------------
+
+-- After another COMPUTE INCREMENTAL STATS,
+-- all partitions have incremental stats, and only the 2
+-- partitions without incremental stats were scanned.
+compute incremental stats item_partitioned;
++------------------------------------------+
+| summary                                  |
++------------------------------------------+
+| Updated 2 partition(s) and 21 column(s). |
++------------------------------------------+
+show table stats item_partitioned;
++-------------+-------+--------+----------+--------------+---------+------------------
+| i_category  | #Rows | #Files | Size     | Bytes Cached | Format  | 
Incremental stats
++-------------+-------+--------+----------+--------------+---------+------------------
+| Books       | 1733  | 1      | 223.74KB | NOT CACHED   | PARQUET | true
+| Camping     | 5328  | 1      | 408.02KB | NOT CACHED   | PARQUET | true
+| Children    | 1786  | 1      | 230.05KB | NOT CACHED   | PARQUET | true
+| Electronics | 1812  | 1      | 232.67KB | NOT CACHED   | PARQUET | true
+| Home        | 1807  | 1      | 232.56KB | NOT CACHED   | PARQUET | true
+| Jewelry     | 1740  | 1      | 223.72KB | NOT CACHED   | PARQUET | true
+| Men         | 1811  | 1      | 231.25KB | NOT CACHED   | PARQUET | true
+| Music       | 1860  | 1      | 237.90KB | NOT CACHED   | PARQUET | true
+| Shoes       | 1835  | 1      | 234.90KB | NOT CACHED   | PARQUET | true
+| Sports      | 1783  | 1      | 227.97KB | NOT CACHED   | PARQUET | true
+| Women       | 1790  | 1      | 226.27KB | NOT CACHED   | PARQUET | true
+| Total       | 17957 | 11     | 2.65MB   | 0B           |         |
++-------------+-------+--------+----------+--------------+---------+------------------
+</codeblock>
+
       <p id="udf_persistence_restriction" rev="2.5.0 IMPALA-1748">
         In <keyword keyref="impala25_full"/> and higher, Impala UDFs and UDAs 
written in C++ are persisted in the metastore database.
         Java UDFs are also persisted, if they were created with the new 
<codeph>CREATE FUNCTION</codeph> syntax for Java UDFs,

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/e278ed22/docs/topics/impala_compute_stats.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_compute_stats.xml 
b/docs/topics/impala_compute_stats.xml
index b7489c5..98694f8 100644
--- a/docs/topics/impala_compute_stats.xml
+++ b/docs/topics/impala_compute_stats.xml
@@ -80,6 +80,12 @@ COMPUTE INCREMENTAL STATS 
[<varname>db_name</varname>.]<varname>table_name</varn
       for full usage details.
     </p>
 
+    <note type="important">
+      <p conref="../shared/impala_common.xml#common/cs_or_cis"/>
+      <p 
conref="../shared/impala_common.xml#common/incremental_stats_after_full"/>
+      <p 
conref="../shared/impala_common.xml#common/incremental_stats_caveats"/>
+    </note>
+
     <p>
       <codeph>COMPUTE INCREMENTAL STATS</codeph> only applies to partitioned 
tables. If you use the
       <codeph>INCREMENTAL</codeph> clause for an unpartitioned table, Impala 
automatically uses the original
@@ -340,104 +346,7 @@ Returned 2 row(s) in 0.01s</codeblock>
       changed partitions, without rescanning the entire table.
     </p>
 
-<codeblock>-- Initially the table has no incremental stats, as indicated
--- by -1 under #Rows and false under Incremental stats.
-show table stats item_partitioned;
-+-------------+-------+--------+----------+--------------+---------+------------------
-| i_category  | #Rows | #Files | Size     | Bytes Cached | Format  | 
Incremental stats
-+-------------+-------+--------+----------+--------------+---------+------------------
-| Books       | -1    | 1      | 223.74KB | NOT CACHED   | PARQUET | false
-| Children    | -1    | 1      | 230.05KB | NOT CACHED   | PARQUET | false
-| Electronics | -1    | 1      | 232.67KB | NOT CACHED   | PARQUET | false
-| Home        | -1    | 1      | 232.56KB | NOT CACHED   | PARQUET | false
-| Jewelry     | -1    | 1      | 223.72KB | NOT CACHED   | PARQUET | false
-| Men         | -1    | 1      | 231.25KB | NOT CACHED   | PARQUET | false
-| Music       | -1    | 1      | 237.90KB | NOT CACHED   | PARQUET | false
-| Shoes       | -1    | 1      | 234.90KB | NOT CACHED   | PARQUET | false
-| Sports      | -1    | 1      | 227.97KB | NOT CACHED   | PARQUET | false
-| Women       | -1    | 1      | 226.27KB | NOT CACHED   | PARQUET | false
-| Total       | -1    | 10     | 2.25MB   | 0B           |         |
-+-------------+-------+--------+----------+--------------+---------+------------------
-
--- After the first COMPUTE INCREMENTAL STATS,
--- all partitions have stats.
-compute incremental stats item_partitioned;
-+-------------------------------------------+
-| summary                                   |
-+-------------------------------------------+
-| Updated 10 partition(s) and 21 column(s). |
-+-------------------------------------------+
-show table stats item_partitioned;
-+-------------+-------+--------+----------+--------------+---------+------------------
-| i_category  | #Rows | #Files | Size     | Bytes Cached | Format  | 
Incremental stats
-+-------------+-------+--------+----------+--------------+---------+------------------
-| Books       | 1733  | 1      | 223.74KB | NOT CACHED   | PARQUET | true
-| Children    | 1786  | 1      | 230.05KB | NOT CACHED   | PARQUET | true
-| Electronics | 1812  | 1      | 232.67KB | NOT CACHED   | PARQUET | true
-| Home        | 1807  | 1      | 232.56KB | NOT CACHED   | PARQUET | true
-| Jewelry     | 1740  | 1      | 223.72KB | NOT CACHED   | PARQUET | true
-| Men         | 1811  | 1      | 231.25KB | NOT CACHED   | PARQUET | true
-| Music       | 1860  | 1      | 237.90KB | NOT CACHED   | PARQUET | true
-| Shoes       | 1835  | 1      | 234.90KB | NOT CACHED   | PARQUET | true
-| Sports      | 1783  | 1      | 227.97KB | NOT CACHED   | PARQUET | true
-| Women       | 1790  | 1      | 226.27KB | NOT CACHED   | PARQUET | true
-| Total       | 17957 | 10     | 2.25MB   | 0B           |         |
-+-------------+-------+--------+----------+--------------+---------+------------------
-
--- Add a new partition...
-alter table item_partitioned add partition (i_category='Camping');
--- Add or replace files in HDFS outside of Impala,
--- rendering the stats for a partition obsolete.
-!import_data_into_sports_partition.sh
-refresh item_partitioned;
-drop incremental stats item_partitioned partition (i_category='Sports');
--- Now some partitions have incremental stats
--- and some do not.
-show table stats item_partitioned;
-+-------------+-------+--------+----------+--------------+---------+------------------
-| i_category  | #Rows | #Files | Size     | Bytes Cached | Format  | 
Incremental stats
-+-------------+-------+--------+----------+--------------+---------+------------------
-| Books       | 1733  | 1      | 223.74KB | NOT CACHED   | PARQUET | true
-| Camping     | -1    | 1      | 408.02KB | NOT CACHED   | PARQUET | false
-| Children    | 1786  | 1      | 230.05KB | NOT CACHED   | PARQUET | true
-| Electronics | 1812  | 1      | 232.67KB | NOT CACHED   | PARQUET | true
-| Home        | 1807  | 1      | 232.56KB | NOT CACHED   | PARQUET | true
-| Jewelry     | 1740  | 1      | 223.72KB | NOT CACHED   | PARQUET | true
-| Men         | 1811  | 1      | 231.25KB | NOT CACHED   | PARQUET | true
-| Music       | 1860  | 1      | 237.90KB | NOT CACHED   | PARQUET | true
-| Shoes       | 1835  | 1      | 234.90KB | NOT CACHED   | PARQUET | true
-| Sports      | -1    | 1      | 227.97KB | NOT CACHED   | PARQUET | false
-| Women       | 1790  | 1      | 226.27KB | NOT CACHED   | PARQUET | true
-| Total       | 17957 | 11     | 2.65MB   | 0B           |         |
-+-------------+-------+--------+----------+--------------+---------+------------------
-
--- After another COMPUTE INCREMENTAL STATS,
--- all partitions have incremental stats, and only the 2
--- partitions without incremental stats were scanned.
-compute incremental stats item_partitioned;
-+------------------------------------------+
-| summary                                  |
-+------------------------------------------+
-| Updated 2 partition(s) and 21 column(s). |
-+------------------------------------------+
-show table stats item_partitioned;
-+-------------+-------+--------+----------+--------------+---------+------------------
-| i_category  | #Rows | #Files | Size     | Bytes Cached | Format  | 
Incremental stats
-+-------------+-------+--------+----------+--------------+---------+------------------
-| Books       | 1733  | 1      | 223.74KB | NOT CACHED   | PARQUET | true
-| Camping     | 5328  | 1      | 408.02KB | NOT CACHED   | PARQUET | true
-| Children    | 1786  | 1      | 230.05KB | NOT CACHED   | PARQUET | true
-| Electronics | 1812  | 1      | 232.67KB | NOT CACHED   | PARQUET | true
-| Home        | 1807  | 1      | 232.56KB | NOT CACHED   | PARQUET | true
-| Jewelry     | 1740  | 1      | 223.72KB | NOT CACHED   | PARQUET | true
-| Men         | 1811  | 1      | 231.25KB | NOT CACHED   | PARQUET | true
-| Music       | 1860  | 1      | 237.90KB | NOT CACHED   | PARQUET | true
-| Shoes       | 1835  | 1      | 234.90KB | NOT CACHED   | PARQUET | true
-| Sports      | 1783  | 1      | 227.97KB | NOT CACHED   | PARQUET | true
-| Women       | 1790  | 1      | 226.27KB | NOT CACHED   | PARQUET | true
-| Total       | 17957 | 11     | 2.65MB   | 0B           |         |
-+-------------+-------+--------+----------+--------------+---------+------------------
-</codeblock>
+<codeblock 
conref="../shared/impala_common.xml#common/compute_stats_walkthrough"/>
 
     <p conref="../shared/impala_common.xml#common/file_format_blurb"/>
 

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/e278ed22/docs/topics/impala_partitioning.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_partitioning.xml 
b/docs/topics/impala_partitioning.xml
index 1729530..c2e36ed 100644
--- a/docs/topics/impala_partitioning.xml
+++ b/docs/topics/impala_partitioning.xml
@@ -603,4 +603,33 @@ SELECT COUNT(*) FROM sales_table WHERE year IN (2005, 
2010, 2015);
 
   </concept>
 
+  <concept id="partition_stats">
+    <title>Keeping Statistics Up to Date for Partitioned Tables</title>
+    <conbody>
+
+      <p>
+        Because the <codeph>COMPUTE STATS</codeph> statement can be 
resource-intensive to run on a partitioned table
+        as new partitions are added, Impala includes a variation of this 
statement that allows computing statistics
+        on a per-partition basis such that stats can be incrementally updated 
when new partitions are added.
+      </p>
+
+      <note type="important">
+        <p conref="../shared/impala_common.xml#common/cs_or_cis"/>
+        <p 
conref="../shared/impala_common.xml#common/incremental_stats_after_full"/>
+        <p 
conref="../shared/impala_common.xml#common/incremental_stats_caveats"/>
+      </note>
+
+      <p rev="2.1.0">
+        The <codeph>COMPUTE INCREMENTAL STATS</codeph> variation computes 
statistics only for partitions that were
+        added or changed since the last <codeph>COMPUTE INCREMENTAL 
STATS</codeph> statement, rather than the entire
+        table. It is typically used for tables where a full <codeph>COMPUTE 
STATS</codeph>
+        operation takes too long to be practical each time a partition is 
added or dropped. See
+        <xref href="impala_perf_stats.xml#perf_stats_incremental"/> for full 
usage details.
+      </p>
+
+<codeblock 
conref="../shared/impala_common.xml#common/compute_stats_walkthrough"/>
+
+    </conbody>
+  </concept>
+
 </concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/e278ed22/docs/topics/impala_perf_stats.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_perf_stats.xml 
b/docs/topics/impala_perf_stats.xml
index 86800f7..ac771be 100644
--- a/docs/topics/impala_perf_stats.xml
+++ b/docs/topics/impala_perf_stats.xml
@@ -354,15 +354,6 @@ show column stats year_month_day;
 
+-----------+---------+------------------+--------+----------+-------------------+
 </codeblock>
 
-      <note>
-        Partitioned tables can grow so large that scanning the entire table, 
as the <codeph>COMPUTE STATS</codeph>
-        statement does, is impractical just to update the statistics for a new 
partition. The standard
-        <codeph>COMPUTE STATS</codeph> statement might take hours, or even 
days. That situation is where you switch
-        to using incremental statistics, a feature available in <keyword 
keyref="impala21_full"/> and higher.
-        See <xref href="impala_perf_stats.xml#perf_stats_incremental"/> for 
details about this feature
-        and the <codeph>COMPUTE INCREMENTAL STATS</codeph> syntax.
-      </note>
-
       <p conref="../shared/impala_common.xml#common/hive_column_stats_caveat"/>
     </conbody>
   </concept>
@@ -387,6 +378,12 @@ show column stats year_month_day;
         entire table each time.
       </p>
 
+      <note type="important">
+        <p conref="../shared/impala_common.xml#common/cs_or_cis"/>
+        <p 
conref="../shared/impala_common.xml#common/incremental_stats_after_full"/>
+        <p 
conref="../shared/impala_common.xml#common/incremental_stats_caveats"/>
+      </note>
+
       <p>
         You can also compute or drop statistics for a single partition by 
including a <codeph>PARTITION</codeph>
         clause in the <codeph>COMPUTE INCREMENTAL STATS</codeph> or 
<codeph>DROP INCREMENTAL STATS</codeph>
@@ -400,9 +397,9 @@ show column stats year_month_day;
       <ul>
         <li>
           <p>
-            If you have an existing partitioned table for which you have 
already computed statistics, issuing
-            <codeph>COMPUTE INCREMENTAL STATS</codeph> without a partition 
clause causes Impala to rescan the
-            entire table. Once the incremental statistics are computed, any 
future <codeph>COMPUTE INCREMENTAL
+            If you have a partitioned table for which you have already run a 
regular <codeph>COMPUTE STATS</codeph>
+            statement, issuing <codeph>COMPUTE INCREMENTAL STATS</codeph> 
without a partition clause causes Impala
+            to rescan the entire table. Once the incremental statistics are 
computed, any future <codeph>COMPUTE INCREMENTAL
             STATS</codeph> statements only scan any new partitions and any 
partitions where you performed
             <codeph>DROP INCREMENTAL STATS</codeph>.
           </p>

Reply via email to