IMPALA-5310: [DOCS] Document TABLESAMPLE clause for COMPUTE STATS Change-Id: I214b63db391bd35562f5ea9091508005f83b2fcc Reviewed-on: http://gerrit.cloudera.org:8080/8975 Reviewed-by: Alex Rodoni <[email protected]> Tested-by: Impala Public Jenkins <[email protected]>
Project: http://git-wip-us.apache.org/repos/asf/impala/repo Commit: http://git-wip-us.apache.org/repos/asf/impala/commit/0ec3cd71 Tree: http://git-wip-us.apache.org/repos/asf/impala/tree/0ec3cd71 Diff: http://git-wip-us.apache.org/repos/asf/impala/diff/0ec3cd71 Branch: refs/heads/2.x Commit: 0ec3cd71071ce623dcf9eb919dfca639f91a5bc7 Parents: 5f4d89f Author: John Russell <[email protected]> Authored: Mon Jan 8 14:41:16 2018 -0800 Committer: Impala Public Jenkins <[email protected]> Committed: Thu Apr 19 22:10:21 2018 +0000 ---------------------------------------------------------------------- docs/topics/impala_compute_stats.xml | 30 ++++++++++++++++++++++-------- docs/topics/impala_tablesample.xml | 6 ++++++ 2 files changed, 28 insertions(+), 8 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/impala/blob/0ec3cd71/docs/topics/impala_compute_stats.xml ---------------------------------------------------------------------- diff --git a/docs/topics/impala_compute_stats.xml b/docs/topics/impala_compute_stats.xml index b62972c..95343f4 100644 --- a/docs/topics/impala_compute_stats.xml +++ b/docs/topics/impala_compute_stats.xml @@ -39,18 +39,20 @@ under the License. <conbody> <p> - <indexterm audience="hidden">COMPUTE STATS statement</indexterm> - Gathers information about volume and distribution of data in a table and all associated columns and - partitions. The information is stored in the metastore database, and used by Impala to help optimize queries. - For example, if Impala can determine that a table is large or small, or has many or few distinct values it - can organize parallelize the work appropriately for a join query or insert operation. For details about the - kinds of information gathered by this statement, see <xref href="impala_perf_stats.xml#perf_stats"/>. + <indexterm audience="hidden">COMPUTE STATS statement</indexterm> The + COMPUTE STATS statement gathers information about volume and distribution + of data in a table and all associated columns and partitions. The + information is stored in the metastore database, and used by Impala to + help optimize queries. For example, if Impala can determine that a table + is large or small, or has many or few distinct values it can organize and + parallelize the work appropriately for a join query or insert operation. + For details about the kinds of information gathered by this statement, see + <xref href="impala_perf_stats.xml#perf_stats"/>. </p> <p conref="../shared/impala_common.xml#common/syntax_blurb"/> -<codeblock rev="impala-3562">COMPUTE STATS - [<varname>db_name</varname>.]<varname>table_name</varname> [ ( <varname>column_list</varname> ) ] +<codeblock rev="2.1.0"><ph rev="2.12.0 IMPALA-5310">COMPUTE STATS [<varname>db_name</varname>.]<varname>table_name</varname> [ ( <varname>column_list</varname> ) ] [TABLESAMPLE SYSTEM(<varname>percentage</varname>) [REPEATABLE(<varname>seed</varname>)]]</ph> <varname>column_list</varname> ::= <varname>column_name</varname> [ , <varname>column_name</varname>, ... ] @@ -104,6 +106,18 @@ COMPUTE INCREMENTAL STATS [<varname>db_name</varname>.]<varname>table_name</varn STATS</codeph>. </p> + <p rev="2.12.0 IMPALA-5310"> + In <keyword keyref="impala212_full"/> and + higher, an optional <codeph>TABLESAMPLE</codeph> clause immediately after + a table reference specifies that the <codeph>COMPUTE STATS</codeph> + operation only processes a specified percentage of the table data. For + tables that are so large that a full <codeph>COMPUTE STATS</codeph> + operation is impractical, you can use <codeph>COMPUTE STATS</codeph> with + a <codeph>TABLESAMPLE</codeph> clause to extrapolate statistics from a + sample of the table data. See <keyword keyref="perf_stats"/>about the + experimental stats extrapolation and sampling features. + </p> + <p rev="2.1.0"> The <codeph>COMPUTE INCREMENTAL STATS</codeph> variation is a shortcut for partitioned tables that works on a subset of partitions rather than the entire table. The incremental nature makes it suitable for large tables http://git-wip-us.apache.org/repos/asf/impala/blob/0ec3cd71/docs/topics/impala_tablesample.xml ---------------------------------------------------------------------- diff --git a/docs/topics/impala_tablesample.xml b/docs/topics/impala_tablesample.xml index f60c5be..e5123cb 100644 --- a/docs/topics/impala_tablesample.xml +++ b/docs/topics/impala_tablesample.xml @@ -81,6 +81,12 @@ under the License. <p conref="../shared/impala_common.xml#common/added_in_290"/> + <p rev="2.12.0 IMPALA-5310"> + See <keyword keyref="compute_stats"/> for the + <codeph>TABLESAMPLE</codeph> clause used in the <codeph>COMPUTE + STATS</codeph> statement. + </p> + <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/> <p>
