[42/51] [partial] impala git commit: [DOCS] Impala doc site update for 3.0

mikeb Wed, 09 May 2018 14:17:08 -0700

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_components.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_components.html 
b/docs/build3x/html/topics/impala_components.html
new file mode 100644
index 0000000..eb6e0f6
--- /dev/null
+++ b/docs/build3x/html/topics/impala_components.html
@@ -0,0 +1,227 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; 
charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) 
Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta 
name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" 
content="../topics/impala_concepts.html"><meta name="prodname" 
content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" 
content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" 
content="Impala"><meta name="version" content="Impala 3.0.x"><meta 
name="version" content="Impala 3.0.x"><meta name="version" content="Impala 
3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" 
content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta 
name="DC.Identifier" content="intro_components"><link rel="stylesheet" 
type="text/css" href="../commonltr.css"><title>Components of the Impala 
Server</title></head><body id="intro_components"><main 
 role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Components of the Impala 
Server</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+      The Impala server is a distributed, massively parallel processing (MPP) 
database engine. It consists of
+      different daemon processes that run on specific hosts within your <span 
class="keyword"></span> cluster.
+    </p>
+
+    <p class="p toc inpage"></p>
+  </div>
+
+  <nav role="navigation" class="related-links"><div class="familylinks"><div 
class="parentlink"><strong>Parent topic:</strong> <a class="link" 
href="../topics/impala_concepts.html">Impala Concepts and 
Architecture</a></div></div></nav><article class="topic concept nested1" 
aria-labelledby="ariaid-title2" id="intro_components__intro_impalad">
+
+    <h2 class="title topictitle2" id="ariaid-title2">The Impala Daemon</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        The core Impala component is a daemon process that runs on each 
DataNode of the cluster, physically represented
+        by the <code class="ph codeph">impalad</code> process. It reads and 
writes to data files; accepts queries transmitted
+        from the <code class="ph codeph">impala-shell</code> command, Hue, 
JDBC, or ODBC; parallelizes the queries and
+        distributes work across the cluster; and transmits intermediate query 
results back to the
+        central coordinator node.
+      </p>
+
+      <p class="p">
+        You can submit a query to the Impala daemon running on any DataNode, 
and that instance of the daemon serves as the
+        <dfn class="term">coordinator node</dfn> for that query. The other 
nodes transmit partial results back to the
+        coordinator, which constructs the final result set for a query. When 
running experiments with functionality
+        through the <code class="ph codeph">impala-shell</code> command, you 
might always connect to the same Impala daemon for
+        convenience. For clusters running production workloads, you might 
load-balance by
+        submitting each query to a different Impala daemon in round-robin 
style, using the JDBC or ODBC interfaces.
+      </p>
+
+      <p class="p">
+        The Impala daemons are in constant communication with the <dfn 
class="term">statestore</dfn>, to confirm which nodes
+        are healthy and can accept new work.
+      </p>
+
+      <p class="p">
+        They also receive broadcast messages from the <span class="keyword 
cmdname">catalogd</span> daemon (introduced in Impala 1.2)
+        whenever any Impala node in the cluster creates, alters, or drops any 
type of object, or when an
+        <code class="ph codeph">INSERT</code> or <code class="ph codeph">LOAD 
DATA</code> statement is processed through Impala. This
+        background communication minimizes the need for <code class="ph 
codeph">REFRESH</code> or <code class="ph codeph">INVALIDATE
+        METADATA</code> statements that were needed to coordinate metadata 
across nodes prior to Impala 1.2.
+      </p>
+
+      <p class="p">
+        In <span class="keyword">Impala 2.9</span> and higher, you can control 
which hosts act as query coordinators
+        and which act as query executors, to improve scalability for highly 
concurrent workloads on large clusters.
+        See <a class="xref" href="impala_scalability.html">Scalability 
Considerations for Impala</a> for details.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Related information:</strong> <a class="xref" 
href="impala_config_options.html#config_options">Modifying Impala Startup 
Options</a>,
+        <a class="xref" href="impala_processes.html#processes">Starting 
Impala</a>, <a class="xref" href="impala_timeouts.html#impalad_timeout">Setting 
the Idle Query and Idle Session Timeouts for impalad</a>,
+        <a class="xref" href="impala_ports.html#ports">Ports Used by 
Impala</a>, <a class="xref" href="impala_proxy.html#proxy">Using Impala through 
a Proxy for High Availability</a>
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title3" 
id="intro_components__intro_statestore">
+
+    <h2 class="title topictitle2" id="ariaid-title3">The Impala Statestore</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        The Impala component known as the <dfn class="term">statestore</dfn> 
checks on the health of Impala daemons on all the
+        DataNodes in a cluster, and continuously relays its findings to each 
of those daemons. It is physically
+        represented by a daemon process named <code class="ph 
codeph">statestored</code>; you only need such a process on one
+        host in the cluster. If an Impala daemon goes offline due to hardware 
failure, network error, software issue,
+        or other reason, the statestore informs all the other Impala daemons 
so that future queries can avoid making
+        requests to the unreachable node.
+      </p>
+
+      <p class="p">
+        Because the statestore's purpose is to help when things go wrong, it 
is not critical to the normal
+        operation of an Impala cluster. If the statestore is not running or 
becomes unreachable, the Impala daemons
+        continue running and distributing work among themselves as usual; the 
cluster just becomes less robust if
+        other Impala daemons fail while the statestore is offline. When the 
statestore comes back online, it re-establishes
+        communication with the Impala daemons and resumes its monitoring 
function.
+      </p>
+
+      <p class="p">
+        Most considerations for load balancing and high availability apply to 
the <span class="keyword cmdname">impalad</span> daemon.
+        The <span class="keyword cmdname">statestored</span> and <span 
class="keyword cmdname">catalogd</span> daemons do not have special
+        requirements for high availability, because problems with those 
daemons do not result in data loss.
+        If those daemons become unavailable due to an outage on a particular
+        host, you can stop the Impala service, delete the <span class="ph 
uicontrol">Impala StateStore</span> and
+        <span class="ph uicontrol">Impala Catalog Server</span> roles, add the 
roles on a different host, and restart the
+        Impala service.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+      <p class="p">
+        <a class="xref" 
href="impala_scalability.html#statestore_scalability">Scalability 
Considerations for the Impala Statestore</a>,
+        <a class="xref" 
href="impala_config_options.html#config_options">Modifying Impala Startup 
Options</a>, <a class="xref" href="impala_processes.html#processes">Starting 
Impala</a>,
+        <a class="xref" 
href="impala_timeouts.html#statestore_timeout">Increasing the Statestore 
Timeout</a>, <a class="xref" href="impala_ports.html#ports">Ports Used by 
Impala</a>
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title4" 
id="intro_components__intro_catalogd">
+
+    <h2 class="title topictitle2" id="ariaid-title4">The Impala Catalog 
Service</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        The Impala component known as the <dfn class="term">catalog 
service</dfn> relays the metadata changes from Impala SQL
+        statements to all the Impala daemons in a cluster. It is physically 
represented by a daemon process named
+        <code class="ph codeph">catalogd</code>; you only need such a process 
on one host in the cluster. Because the requests
+        are passed through the statestore daemon, it makes sense to run the 
<span class="keyword cmdname">statestored</span> and
+        <span class="keyword cmdname">catalogd</span> services on the same 
host.
+      </p>
+
+      <p class="p">
+        The catalog service avoids the need to issue
+        <code class="ph codeph">REFRESH</code> and <code class="ph 
codeph">INVALIDATE METADATA</code> statements when the metadata changes are
+        performed by statements issued through Impala. When you create a 
table, load data, and so on through Hive,
+        you do need to issue <code class="ph codeph">REFRESH</code> or <code 
class="ph codeph">INVALIDATE METADATA</code> on an Impala node
+        before executing a query there.
+      </p>
+
+      <p class="p">
+        This feature touches a number of aspects of Impala:
+      </p>
+
+
+
+      <ul class="ul" id="intro_catalogd__catalogd_xrefs">
+        <li class="li">
+          <p class="p">
+            See <a class="xref" href="impala_install.html#install">Installing 
Impala</a>, <a class="xref" href="impala_upgrading.html#upgrading">Upgrading 
Impala</a> and
+            <a class="xref" href="impala_processes.html#processes">Starting 
Impala</a>, for usage information for the
+            <span class="keyword cmdname">catalogd</span> daemon.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            The <code class="ph codeph">REFRESH</code> and <code class="ph 
codeph">INVALIDATE METADATA</code> statements are not needed
+            when the <code class="ph codeph">CREATE TABLE</code>, <code 
class="ph codeph">INSERT</code>, or other table-changing or
+            data-changing operation is performed through Impala. These 
statements are still needed if such
+            operations are done through Hive or by manipulating data files 
directly in HDFS, but in those cases the
+            statements only need to be issued on one Impala node rather than 
on all nodes. See
+            <a class="xref" href="impala_refresh.html#refresh">REFRESH 
Statement</a> and
+            <a class="xref" 
href="impala_invalidate_metadata.html#invalidate_metadata">INVALIDATE METADATA 
Statement</a> for the latest usage information for
+            those statements.
+          </p>
+        </li>
+      </ul>
+
+      <div class="p">
+        Use <code class="ph codeph">--load_catalog_in_background</code> option 
to control when
+        the metadata of a table is loaded.
+        <ul class="ul">
+          <li class="li">
+            If set to <code class="ph codeph">false</code>, the metadata of a 
table is
+            loaded when it is referenced for the first time. This means that 
the
+            first run of a particular query can be slower than subsequent runs.
+            Starting in Impala 2.2, the default for
+            <code class="ph codeph">load_catalog_in_background</code> is
+            <code class="ph codeph">false</code>.
+          </li>
+          <li class="li">
+            If set to <code class="ph codeph">true</code>, the catalog service 
attempts to
+            load metadata for a table even if no query needed that metadata. So
+            metadata will possibly be already loaded when the first query that
+            would need it is run. However, for the following reasons, we
+            recommend not to set the option to <code class="ph 
codeph">true</code>.
+            <ul class="ul">
+              <li class="li">
+                Background load can interfere with query-specific metadata
+                loading. This can happen on startup or after invalidating
+                metadata, with a duration depending on the amount of metadata,
+                and can lead to a seemingly random long running queries that 
are
+                difficult to diagnose.
+              </li>
+              <li class="li">
+                Impala may load metadata for tables that are possibly never
+                used, potentially increasing catalog size and consequently 
memory
+                usage for both catalog service and Impala Daemon.
+              </li>
+            </ul>
+          </li>
+        </ul>
+      </div>
+
+      <p class="p">
+        Most considerations for load balancing and high availability apply to 
the <span class="keyword cmdname">impalad</span> daemon.
+        The <span class="keyword cmdname">statestored</span> and <span 
class="keyword cmdname">catalogd</span> daemons do not have special
+        requirements for high availability, because problems with those 
daemons do not result in data loss.
+        If those daemons become unavailable due to an outage on a particular
+        host, you can stop the Impala service, delete the <span class="ph 
uicontrol">Impala StateStore</span> and
+        <span class="ph uicontrol">Impala Catalog Server</span> roles, add the 
roles on a different host, and restart the
+        Impala service.
+      </p>
+
+      <div class="note note note_note"><span class="note__title 
notetitle">Note:</span>
+        <p class="p">
+        In Impala 1.2.4 and higher, you can specify a table name with <code 
class="ph codeph">INVALIDATE METADATA</code> after
+        the table is created in Hive, allowing you to make individual tables 
visible to Impala without doing a full
+        reload of the catalog metadata. Impala 1.2.4 also includes other 
changes to make the metadata broadcast
+        mechanism faster and more responsive, especially during Impala 
startup. See
+        <a class="xref" 
href="../shared/../topics/impala_new_features.html#new_features_124">New 
Features in Impala 1.2.4</a> for details.
+      </p>
+      </div>
+
+      <p class="p">
+        <strong class="ph b">Related information:</strong> <a class="xref" 
href="impala_config_options.html#config_options">Modifying Impala Startup 
Options</a>,
+        <a class="xref" href="impala_processes.html#processes">Starting 
Impala</a>, <a class="xref" href="impala_ports.html#ports">Ports Used by 
Impala</a>
+      </p>
+    </div>
+  </article>
+</article></main></body></html>


http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_compression_codec.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_compression_codec.html 
b/docs/build3x/html/topics/impala_compression_codec.html
new file mode 100644
index 0000000..5933efa
--- /dev/null
+++ b/docs/build3x/html/topics/impala_compression_codec.html
@@ -0,0 +1,92 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; 
charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) 
Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta 
name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" 
content="../topics/impala_query_options.html"><meta name="prodname" 
content="Impala"><meta name="prodname" content="Impala"><meta name="version" 
content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta 
name="DC.Format" content="XHTML"><meta name="DC.Identifier" 
content="compression_codec"><link rel="stylesheet" type="text/css" 
href="../commonltr.css"><title>COMPRESSION_CODEC Query Option (Impala 2.0 or 
higher only)</title></head><body id="compression_codec"><main 
role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">COMPRESSION_CODEC Query 
Option (<span class="keyword">Impala 2.0</span> or higher only)</h1>
+
+
+
+  <div class="body conbody">
+
+
+
+
+
+    <p class="p">
+
+      When Impala writes Parquet data files using the <code class="ph 
codeph">INSERT</code> statement, the underlying compression
+      is controlled by the <code class="ph codeph">COMPRESSION_CODEC</code> 
query option.
+    </p>
+
+    <div class="note note note_note"><span class="note__title 
notetitle">Note:</span>
+      Prior to Impala 2.0, this option was named <code class="ph 
codeph">PARQUET_COMPRESSION_CODEC</code>. In Impala 2.0 and
+      later, the <code class="ph codeph">PARQUET_COMPRESSION_CODEC</code> name 
is not recognized. Use the more general name
+      <code class="ph codeph">COMPRESSION_CODEC</code> for new code.
+    </div>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>SET COMPRESSION_CODEC=<var class="keyword 
varname">codec_name</var>;</code></pre>
+
+    <p class="p">
+      The allowed values for this query option are <code class="ph 
codeph">SNAPPY</code> (the default), <code class="ph codeph">GZIP</code>,
+      and <code class="ph codeph">NONE</code>.
+    </p>
+
+    <div class="note note note_note"><span class="note__title 
notetitle">Note:</span>
+      A Parquet file created with <code class="ph 
codeph">COMPRESSION_CODEC=NONE</code> is still typically smaller than the
+      original data, due to encoding schemes such as run-length encoding and 
dictionary encoding that are applied
+      separately from compression.
+    </div>
+
+    <p class="p"></p>
+
+    <p class="p">
+      The option value is not case-sensitive.
+    </p>
+
+    <p class="p">
+      If the option is set to an unrecognized value, all kinds of queries will 
fail due to the invalid option
+      setting, not just queries involving Parquet tables. (The value <code 
class="ph codeph">BZIP2</code> is also recognized, but
+      is not compatible with Parquet tables.)
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Type:</strong> <code class="ph codeph">STRING</code>
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Default:</strong> <code class="ph 
codeph">SNAPPY</code>
+    </p>
+
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>set compression_codec=gzip;
+insert into parquet_table_highly_compressed select * from t1;
+
+set compression_codec=snappy;
+insert into parquet_table_compression_plus_fast_queries select * from t1;
+
+set compression_codec=none;
+insert into parquet_table_no_compression select * from t1;
+
+set compression_codec=foo;
+select * from t1 limit 5;
+ERROR: Invalid compression codec: foo
+</code></pre>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+    <p class="p">
+      For information about how compressing Parquet data files affects query 
performance, see
+      <a class="xref" href="impala_parquet.html#parquet_compression">Snappy 
and GZip Compression for Parquet Data Files</a>.
+    </p>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div 
class="parentlink"><strong>Parent topic:</strong> <a class="link" 
href="../topics/impala_query_options.html">Query Options for the SET 
Statement</a></div></div></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_compute_stats.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_compute_stats.html 
b/docs/build3x/html/topics/impala_compute_stats.html
new file mode 100644
index 0000000..407ba97
--- /dev/null
+++ b/docs/build3x/html/topics/impala_compute_stats.html
@@ -0,0 +1,637 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; 
charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) 
Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta 
name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" 
content="../topics/impala_langref_sql.html"><meta name="prodname" 
content="Impala"><meta name="prodname" content="Impala"><meta name="version" 
content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta 
name="DC.Format" content="XHTML"><meta name="DC.Identifier" 
content="compute_stats"><link rel="stylesheet" type="text/css" 
href="../commonltr.css"><title>COMPUTE STATS Statement</title></head><body 
id="compute_stats"><main role="main"><article role="article" 
aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">COMPUTE STATS Statement</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+       The
+      COMPUTE STATS statement gathers information about volume and distribution
+      of data in a table and all associated columns and partitions. The
+      information is stored in the metastore database, and used by Impala to
+      help optimize queries. For example, if Impala can determine that a table
+      is large or small, or has many or few distinct values it can organize and
+      parallelize the work appropriately for a join query or insert operation.
+      For details about the kinds of information gathered by this statement, 
see
+        <a class="xref" href="impala_perf_stats.html#perf_stats">Table and 
Column Statistics</a>.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code><span class="ph">COMPUTE STATS [<var 
class="keyword varname">db_name</var>.]<var class="keyword 
varname">table_name</var>  [ ( <var class="keyword varname">column_list</var> ) 
] [TABLESAMPLE SYSTEM(<var class="keyword varname">percentage</var>) 
[REPEATABLE(<var class="keyword varname">seed</var>)]]</span>
+
+<var class="keyword varname">column_list</var> ::= <var class="keyword 
varname">column_name</var> [ , <var class="keyword varname">column_name</var>, 
... ]
+
+COMPUTE INCREMENTAL STATS [<var class="keyword varname">db_name</var>.]<var 
class="keyword varname">table_name</var> [PARTITION (<var class="keyword 
varname">partition_spec</var>)]
+
+<var class="keyword varname">partition_spec</var> ::= <var class="keyword 
varname">simple_partition_spec</var> | <span class="ph"><var class="keyword 
varname">complex_partition_spec</var></span>
+
+<var class="keyword varname">simple_partition_spec</var> ::= <var 
class="keyword varname">partition_col</var>=<var class="keyword 
varname">constant_value</var>
+
+<span class="ph"><var class="keyword varname">complex_partition_spec</var> ::= 
<var class="keyword varname">comparison_expression_on_partition_col</var></span>
+</code></pre>
+
+    <p class="p">
+        The <code class="ph codeph">PARTITION</code> clause is only allowed in 
combination with the <code class="ph codeph">INCREMENTAL</code>
+        clause. It is optional for <code class="ph codeph">COMPUTE INCREMENTAL 
STATS</code>, and required for <code class="ph codeph">DROP
+        INCREMENTAL STATS</code>. Whenever you specify partitions through the 
<code class="ph codeph">PARTITION
+        (<var class="keyword varname">partition_spec</var>)</code> clause in a 
<code class="ph codeph">COMPUTE INCREMENTAL STATS</code> or
+        <code class="ph codeph">DROP INCREMENTAL STATS</code> statement, you 
must include all the partitioning columns in the
+        specification, and specify constant values for all the partition key 
columns.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+    <p class="p">
+      Originally, Impala relied on users to run the Hive <code class="ph 
codeph">ANALYZE
+        TABLE</code> statement, but that method of gathering statistics proved
+      unreliable and difficult to use. The Impala <code class="ph 
codeph">COMPUTE STATS</code>
+      statement was built to improve the reliability and user-friendliness of
+      this operation. <code class="ph codeph">COMPUTE STATS</code> does not 
require any setup
+      steps or special configuration. You only run a single Impala
+        <code class="ph codeph">COMPUTE STATS</code> statement to gather both 
table and column
+      statistics, rather than separate Hive <code class="ph codeph">ANALYZE 
TABLE</code>
+      statements for each kind of statistics.
+    </p>
+
+    <p class="p">
+      For non-incremental <code class="ph codeph">COMPUTE STATS</code>
+      statement, the columns for which statistics are computed can be specified
+      with an optional comma-separate list of columns.
+    </p>
+
+    <p class="p">
+      If no column list is given, the <code class="ph codeph">COMPUTE 
STATS</code> statement
+      computes column-level statistics for all columns of the table. This adds
+      potentially unneeded work for columns whose stats are not needed by
+      queries. It can be especially costly for very wide tables and unneeded
+      large string fields.
+    </p>
+    <p class="p">
+      <code class="ph codeph">COMPUTE STATS</code> returns an error when a 
specified column
+      cannot be analyzed, such as when the column does not exist, the column is
+      of an unsupported type for COMPUTE STATS, e.g. colums of complex types,
+      or the column is a partitioning column.
+
+    </p>
+    <p class="p">
+      If an empty column list is given, no column is analyzed by <code 
class="ph codeph">COMPUTE
+        STATS</code>.
+    </p>
+
+    <p class="p">
+      In <span class="keyword">Impala 2.12</span> and
+      higher, an optional <code class="ph codeph">TABLESAMPLE</code> clause 
immediately after
+      a table reference specifies that the <code class="ph codeph">COMPUTE 
STATS</code>
+      operation only processes a specified percentage of the table data. For
+      tables that are so large that a full <code class="ph codeph">COMPUTE 
STATS</code>
+      operation is impractical, you can use <code class="ph codeph">COMPUTE 
STATS</code> with
+      a <code class="ph codeph">TABLESAMPLE</code> clause to extrapolate 
statistics from a
+      sample of the table data. See <a href="impala_perf_stats.html"><span 
class="keyword">Table and Column Statistics</span></a>about the
+      experimental stats extrapolation and sampling features.
+    </p>
+
+    <p class="p">
+      The <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> variation 
is a shortcut for partitioned tables that works on a
+      subset of partitions rather than the entire table. The incremental 
nature makes it suitable for large tables
+      with many partitions, where a full <code class="ph codeph">COMPUTE 
STATS</code> operation takes too long to be practical
+      each time a partition is added or dropped. See <a class="xref" 
href="impala_perf_stats.html#perf_stats_incremental">impala_perf_stats.html#perf_stats_incremental</a>
+      for full usage details.
+    </p>
+
+    <div class="note important note_important"><span class="note__title 
importanttitle">Important:</span>
+      <p class="p">
+        For a particular table, use either <code class="ph codeph">COMPUTE 
STATS</code> or
+        <code class="ph codeph">COMPUTE INCREMENTAL STATS</code>, but never 
combine the two or
+        alternate between them. If you switch from <code class="ph 
codeph">COMPUTE STATS</code> to
+        <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> during the 
lifetime of a table, or
+        vice versa, drop all statistics by running <code class="ph 
codeph">DROP STATS</code> before
+        making the switch.
+      </p>
+      <p class="p">
+        When you run <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> 
on a table for the first time,
+        the statistics are computed again from scratch regardless of whether 
the table already
+        has statistics. Therefore, expect a one-time resource-intensive 
operation
+        for scanning the entire table when running <code class="ph 
codeph">COMPUTE INCREMENTAL STATS</code>
+        for the first time on a given table.
+      </p>
+      <p class="p">
+        For a table with a huge number of partitions and many columns, the 
approximately 400 bytes
+        of metadata per column per partition can add up to significant memory 
overhead, as it must
+        be cached on the <span class="keyword cmdname">catalogd</span> host 
and on every <span class="keyword cmdname">impalad</span> host
+        that is eligible to be a coordinator. If this metadata for all tables 
combined exceeds 2 GB,
+        you might experience service downtime.
+      </p>
+    </div>
+
+    <p class="p">
+      <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> only applies to 
partitioned tables. If you use the
+      <code class="ph codeph">INCREMENTAL</code> clause for an unpartitioned 
table, Impala automatically uses the original
+      <code class="ph codeph">COMPUTE STATS</code> statement. Such tables 
display <code class="ph codeph">false</code> under the
+      <code class="ph codeph">Incremental stats</code> column of the <code 
class="ph codeph">SHOW TABLE STATS</code> output.
+    </p>
+    <div class="note note note_note"><span class="note__title 
notetitle">Note:</span>
+      <div class="p">
+        Because many of the most performance-critical and resource-intensive
+        operations rely on table and column statistics to construct accurate 
and
+        efficient plans, <code class="ph codeph">COMPUTE STATS</code> is an 
important step at
+        the end of your ETL process. Run <code class="ph codeph">COMPUTE 
STATS</code> on all
+        tables as your first step during performance tuning for slow queries, 
or
+        troubleshooting for out-of-memory conditions:
+        <ul class="ul">
+          <li class="li">
+            Accurate statistics help Impala construct an efficient query plan
+            for join queries, improving performance and reducing memory usage.
+          </li>
+          <li class="li">
+            Accurate statistics help Impala distribute the work effectively
+            for insert operations into Parquet tables, improving performance 
and
+            reducing memory usage.
+          </li>
+          <li class="li">
+            Accurate statistics help Impala estimate the memory
+            required for each query, which is important when you use resource
+            management features, such as admission control and the YARN 
resource
+            management framework. The statistics help Impala to achieve high
+            concurrency, full utilization of available memory, and avoid
+            contention with workloads from other Hadoop components.
+          </li>
+          <li class="li">
+            In <span class="keyword">Impala 2.8</span> and
+            higher, when you run the <code class="ph codeph">COMPUTE 
STATS</code> or
+              <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> 
statement against a
+            Parquet table, Impala automatically applies the query option 
setting
+              <code class="ph codeph">MT_DOP=4</code> to increase the amount 
of intra-node
+            parallelism during this CPU-intensive operation. See <a 
class="xref" href="impala_mt_dop.html">MT_DOP Query Option</a> for details 
about what this query option does
+            and how to use it with CPU-intensive <code class="ph 
codeph">SELECT</code>
+            statements.
+          </li>
+        </ul>
+      </div>
+    </div>
+
+    <p class="p">
+      <strong class="ph b">Computing stats for groups of partitions:</strong>
+    </p>
+
+    <p class="p">
+      In <span class="keyword">Impala 2.8</span> and higher, you can run <code 
class="ph codeph">COMPUTE INCREMENTAL STATS</code>
+      on multiple partitions, instead of the entire table or one partition at 
a time. You include
+      comparison operators other than <code class="ph codeph">=</code> in the 
<code class="ph codeph">PARTITION</code> clause,
+      and the <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> 
statement applies to all partitions that
+      match the comparison expression.
+    </p>
+
+    <p class="p">
+      For example, the <code class="ph codeph">INT_PARTITIONS</code> table 
contains 4 partitions.
+      The following <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> 
statements affect some but not all
+      partitions, as indicated by the <code class="ph codeph">Updated <var 
class="keyword varname">n</var> partition(s)</code>
+      messages. The partitions that are affected depend on values in the 
partition key column <code class="ph codeph">X</code>
+      that match the comparison expression in the <code class="ph 
codeph">PARTITION</code> clause.
+    </p>
+
+<pre class="pre codeblock"><code>
+show partitions int_partitions;
++-------+-------+--------+------+--------------+-------------------+---------+...
+| x     | #Rows | #Files | Size | Bytes Cached | Cache Replication | Format  
|...
++-------+-------+--------+------+--------------+-------------------+---------+...
+| 99    | -1    | 0      | 0B   | NOT CACHED   | NOT CACHED        | PARQUET 
|...
+| 120   | -1    | 0      | 0B   | NOT CACHED   | NOT CACHED        | TEXT    
|...
+| 150   | -1    | 0      | 0B   | NOT CACHED   | NOT CACHED        | TEXT    
|...
+| 200   | -1    | 0      | 0B   | NOT CACHED   | NOT CACHED        | TEXT    
|...
+| Total | -1    | 0      | 0B   | 0B           |                   |         
|...
++-------+-------+--------+------+--------------+-------------------+---------+...
+
+compute incremental stats int_partitions partition (x &lt; 100);
++-----------------------------------------+
+| summary                                 |
++-----------------------------------------+
+| Updated 1 partition(s) and 1 column(s). |
++-----------------------------------------+
+
+compute incremental stats int_partitions partition (x in (100, 150, 200));
++-----------------------------------------+
+| summary                                 |
++-----------------------------------------+
+| Updated 2 partition(s) and 1 column(s). |
++-----------------------------------------+
+
+compute incremental stats int_partitions partition (x between 100 and 175);
++-----------------------------------------+
+| summary                                 |
++-----------------------------------------+
+| Updated 2 partition(s) and 1 column(s). |
++-----------------------------------------+
+
+compute incremental stats int_partitions partition (x in (100, 150, 200) or x 
&lt; 100);
++-----------------------------------------+
+| summary                                 |
++-----------------------------------------+
+| Updated 3 partition(s) and 1 column(s). |
++-----------------------------------------+
+
+compute incremental stats int_partitions partition (x != 150);
++-----------------------------------------+
+| summary                                 |
++-----------------------------------------+
+| Updated 3 partition(s) and 1 column(s). |
++-----------------------------------------+
+
+</code></pre>
+
+    <p class="p">
+        <strong class="ph b">Complex type considerations:</strong>
+      </p>
+
+    <p class="p">
+      Currently, the statistics created by the <code class="ph codeph">COMPUTE 
STATS</code> statement do not include
+      information about complex type columns. The column stats metrics for 
complex columns are always shown
+      as -1. For queries involving complex type columns, Impala uses
+      heuristics to estimate the data distribution within such columns.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">HBase considerations:</strong>
+      </p>
+
+    <p class="p">
+      <code class="ph codeph">COMPUTE STATS</code> works for HBase tables 
also. The statistics gathered for HBase tables are
+      somewhat different than for HDFS-backed tables, but that metadata is 
still used for optimization when HBase
+      tables are involved in join queries.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Amazon S3 considerations:</strong>
+      </p>
+
+    <p class="p">
+      <code class="ph codeph">COMPUTE STATS</code> also works for tables where 
data resides in the Amazon Simple Storage Service (S3).
+      See <a class="xref" href="impala_s3.html#s3">Using Impala with the 
Amazon S3 Filesystem</a> for details.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Performance considerations:</strong>
+      </p>
+
+    <p class="p">
+      The statistics collected by <code class="ph codeph">COMPUTE STATS</code> 
are used to optimize join queries
+      <code class="ph codeph">INSERT</code> operations into Parquet tables, 
and other resource-intensive kinds of SQL statements.
+      See <a class="xref" href="impala_perf_stats.html#perf_stats">Table and 
Column Statistics</a> for details.
+    </p>
+
+    <p class="p">
+      For large tables, the <code class="ph codeph">COMPUTE STATS</code> 
statement itself might take a long time and you
+      might need to tune its performance. The <code class="ph codeph">COMPUTE 
STATS</code> statement does not work with the
+      <code class="ph codeph">EXPLAIN</code> statement, or the <code class="ph 
codeph">SUMMARY</code> command in <span class="keyword 
cmdname">impala-shell</span>.
+      You can use the <code class="ph codeph">PROFILE</code> statement in 
<span class="keyword cmdname">impala-shell</span> to examine timing information
+      for the statement as a whole. If a basic <code class="ph codeph">COMPUTE 
STATS</code> statement takes a long time for a
+      partitioned table, consider switching to the <code class="ph 
codeph">COMPUTE INCREMENTAL STATS</code> syntax so that only
+      newly added partitions are analyzed each time.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+    <p class="p">
+      This example shows two tables, <code class="ph codeph">T1</code> and 
<code class="ph codeph">T2</code>, with a small number distinct
+      values linked by a parent-child relationship between <code class="ph 
codeph">T1.ID</code> and <code class="ph codeph">T2.PARENT</code>.
+      <code class="ph codeph">T1</code> is tiny, while <code class="ph 
codeph">T2</code> has approximately 100K rows. Initially, the statistics
+      includes physical measurements such as the number of files, the total 
size, and size measurements for
+      fixed-length columns such as with the <code class="ph codeph">INT</code> 
type. Unknown values are represented by -1. After
+      running <code class="ph codeph">COMPUTE STATS</code> for each table, 
much more information is available through the
+      <code class="ph codeph">SHOW STATS</code> statements. If you were 
running a join query involving both of these tables, you
+      would need statistics for both tables to get the most effective 
optimization for the query.
+    </p>
+
+
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; show table stats t1;
+Query: show table stats t1
++-------+--------+------+--------+
+| #Rows | #Files | Size | Format |
++-------+--------+------+--------+
+| -1    | 1      | 33B  | TEXT   |
++-------+--------+------+--------+
+Returned 1 row(s) in 0.02s
+[localhost:21000] &gt; show table stats t2;
+Query: show table stats t2
++-------+--------+----------+--------+
+| #Rows | #Files | Size     | Format |
++-------+--------+----------+--------+
+| -1    | 28     | 960.00KB | TEXT   |
++-------+--------+----------+--------+
+Returned 1 row(s) in 0.01s
+[localhost:21000] &gt; show column stats t1;
+Query: show column stats t1
++--------+--------+------------------+--------+----------+----------+
+| Column | Type   | #Distinct Values | #Nulls | Max Size | Avg Size |
++--------+--------+------------------+--------+----------+----------+
+| id     | INT    | -1               | -1     | 4        | 4        |
+| s      | STRING | -1               | -1     | -1       | -1       |
++--------+--------+------------------+--------+----------+----------+
+Returned 2 row(s) in 1.71s
+[localhost:21000] &gt; show column stats t2;
+Query: show column stats t2
++--------+--------+------------------+--------+----------+----------+
+| Column | Type   | #Distinct Values | #Nulls | Max Size | Avg Size |
++--------+--------+------------------+--------+----------+----------+
+| parent | INT    | -1               | -1     | 4        | 4        |
+| s      | STRING | -1               | -1     | -1       | -1       |
++--------+--------+------------------+--------+----------+----------+
+Returned 2 row(s) in 0.01s
+[localhost:21000] &gt; compute stats t1;
+Query: compute stats t1
++-----------------------------------------+
+| summary                                 |
++-----------------------------------------+
+| Updated 1 partition(s) and 2 column(s). |
++-----------------------------------------+
+Returned 1 row(s) in 5.30s
+[localhost:21000] &gt; show table stats t1;
+Query: show table stats t1
++-------+--------+------+--------+
+| #Rows | #Files | Size | Format |
++-------+--------+------+--------+
+| 3     | 1      | 33B  | TEXT   |
++-------+--------+------+--------+
+Returned 1 row(s) in 0.01s
+[localhost:21000] &gt; show column stats t1;
+Query: show column stats t1
++--------+--------+------------------+--------+----------+----------+
+| Column | Type   | #Distinct Values | #Nulls | Max Size | Avg Size |
++--------+--------+------------------+--------+----------+----------+
+| id     | INT    | 3                | -1     | 4        | 4        |
+| s      | STRING | 3                | -1     | -1       | -1       |
++--------+--------+------------------+--------+----------+----------+
+Returned 2 row(s) in 0.02s
+[localhost:21000] &gt; compute stats t2;
+Query: compute stats t2
++-----------------------------------------+
+| summary                                 |
++-----------------------------------------+
+| Updated 1 partition(s) and 2 column(s). |
++-----------------------------------------+
+Returned 1 row(s) in 5.70s
+[localhost:21000] &gt; show table stats t2;
+Query: show table stats t2
++-------+--------+----------+--------+
+| #Rows | #Files | Size     | Format |
++-------+--------+----------+--------+
+| 98304 | 1      | 960.00KB | TEXT   |
++-------+--------+----------+--------+
+Returned 1 row(s) in 0.03s
+[localhost:21000] &gt; show column stats t2;
+Query: show column stats t2
++--------+--------+------------------+--------+----------+----------+
+| Column | Type   | #Distinct Values | #Nulls | Max Size | Avg Size |
++--------+--------+------------------+--------+----------+----------+
+| parent | INT    | 3                | -1     | 4        | 4        |
+| s      | STRING | 6                | -1     | 14       | 9.3      |
++--------+--------+------------------+--------+----------+----------+
+Returned 2 row(s) in 0.01s</code></pre>
+
+    <p class="p">
+      The following example shows how to use the <code class="ph 
codeph">INCREMENTAL</code> clause, available in Impala 2.1.0 and
+      higher. The <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> 
syntax lets you collect statistics for newly added or
+      changed partitions, without rescanning the entire table.
+    </p>
+
+<pre class="pre codeblock"><code>-- Initially the table has no incremental 
stats, as indicated
+-- 'false' under Incremental stats.
+show table stats item_partitioned;
++-------------+-------+--------+----------+--------------+---------+------------------
+| i_category  | #Rows | #Files | Size     | Bytes Cached | Format  | 
Incremental stats
++-------------+-------+--------+----------+--------------+---------+------------------
+| Books       | -1    | 1      | 223.74KB | NOT CACHED   | PARQUET | false
+| Children    | -1    | 1      | 230.05KB | NOT CACHED   | PARQUET | false
+| Electronics | -1    | 1      | 232.67KB | NOT CACHED   | PARQUET | false
+| Home        | -1    | 1      | 232.56KB | NOT CACHED   | PARQUET | false
+| Jewelry     | -1    | 1      | 223.72KB | NOT CACHED   | PARQUET | false
+| Men         | -1    | 1      | 231.25KB | NOT CACHED   | PARQUET | false
+| Music       | -1    | 1      | 237.90KB | NOT CACHED   | PARQUET | false
+| Shoes       | -1    | 1      | 234.90KB | NOT CACHED   | PARQUET | false
+| Sports      | -1    | 1      | 227.97KB | NOT CACHED   | PARQUET | false
+| Women       | -1    | 1      | 226.27KB | NOT CACHED   | PARQUET | false
+| Total       | -1    | 10     | 2.25MB   | 0B           |         |
++-------------+-------+--------+----------+--------------+---------+------------------
+
+-- After the first COMPUTE INCREMENTAL STATS,
+-- all partitions have stats. The first
+-- COMPUTE INCREMENTAL STATS scans the whole
+-- table, discarding any previous stats from
+-- a traditional COMPUTE STATS statement.
+compute incremental stats item_partitioned;
++-------------------------------------------+
+| summary                                   |
++-------------------------------------------+
+| Updated 10 partition(s) and 21 column(s). |
++-------------------------------------------+
+show table stats item_partitioned;
++-------------+-------+--------+----------+--------------+---------+------------------
+| i_category  | #Rows | #Files | Size     | Bytes Cached | Format  | 
Incremental stats
++-------------+-------+--------+----------+--------------+---------+------------------
+| Books       | 1733  | 1      | 223.74KB | NOT CACHED   | PARQUET | true
+| Children    | 1786  | 1      | 230.05KB | NOT CACHED   | PARQUET | true
+| Electronics | 1812  | 1      | 232.67KB | NOT CACHED   | PARQUET | true
+| Home        | 1807  | 1      | 232.56KB | NOT CACHED   | PARQUET | true
+| Jewelry     | 1740  | 1      | 223.72KB | NOT CACHED   | PARQUET | true
+| Men         | 1811  | 1      | 231.25KB | NOT CACHED   | PARQUET | true
+| Music       | 1860  | 1      | 237.90KB | NOT CACHED   | PARQUET | true
+| Shoes       | 1835  | 1      | 234.90KB | NOT CACHED   | PARQUET | true
+| Sports      | 1783  | 1      | 227.97KB | NOT CACHED   | PARQUET | true
+| Women       | 1790  | 1      | 226.27KB | NOT CACHED   | PARQUET | true
+| Total       | 17957 | 10     | 2.25MB   | 0B           |         |
++-------------+-------+--------+----------+--------------+---------+------------------
+
+-- Add a new partition...
+alter table item_partitioned add partition (i_category='Camping');
+-- Add or replace files in HDFS outside of Impala,
+-- rendering the stats for a partition obsolete.
+!import_data_into_sports_partition.sh
+refresh item_partitioned;
+drop incremental stats item_partitioned partition (i_category='Sports');
+-- Now some partitions have incremental stats
+-- and some do not.
+show table stats item_partitioned;
++-------------+-------+--------+----------+--------------+---------+------------------
+| i_category  | #Rows | #Files | Size     | Bytes Cached | Format  | 
Incremental stats
++-------------+-------+--------+----------+--------------+---------+------------------
+| Books       | 1733  | 1      | 223.74KB | NOT CACHED   | PARQUET | true
+| Camping     | -1    | 1      | 408.02KB | NOT CACHED   | PARQUET | false
+| Children    | 1786  | 1      | 230.05KB | NOT CACHED   | PARQUET | true
+| Electronics | 1812  | 1      | 232.67KB | NOT CACHED   | PARQUET | true
+| Home        | 1807  | 1      | 232.56KB | NOT CACHED   | PARQUET | true
+| Jewelry     | 1740  | 1      | 223.72KB | NOT CACHED   | PARQUET | true
+| Men         | 1811  | 1      | 231.25KB | NOT CACHED   | PARQUET | true
+| Music       | 1860  | 1      | 237.90KB | NOT CACHED   | PARQUET | true
+| Shoes       | 1835  | 1      | 234.90KB | NOT CACHED   | PARQUET | true
+| Sports      | -1    | 1      | 227.97KB | NOT CACHED   | PARQUET | false
+| Women       | 1790  | 1      | 226.27KB | NOT CACHED   | PARQUET | true
+| Total       | 17957 | 11     | 2.65MB   | 0B           |         |
++-------------+-------+--------+----------+--------------+---------+------------------
+
+-- After another COMPUTE INCREMENTAL STATS,
+-- all partitions have incremental stats, and only the 2
+-- partitions without incremental stats were scanned.
+compute incremental stats item_partitioned;
++------------------------------------------+
+| summary                                  |
++------------------------------------------+
+| Updated 2 partition(s) and 21 column(s). |
++------------------------------------------+
+show table stats item_partitioned;
++-------------+-------+--------+----------+--------------+---------+------------------
+| i_category  | #Rows | #Files | Size     | Bytes Cached | Format  | 
Incremental stats
++-------------+-------+--------+----------+--------------+---------+------------------
+| Books       | 1733  | 1      | 223.74KB | NOT CACHED   | PARQUET | true
+| Camping     | 5328  | 1      | 408.02KB | NOT CACHED   | PARQUET | true
+| Children    | 1786  | 1      | 230.05KB | NOT CACHED   | PARQUET | true
+| Electronics | 1812  | 1      | 232.67KB | NOT CACHED   | PARQUET | true
+| Home        | 1807  | 1      | 232.56KB | NOT CACHED   | PARQUET | true
+| Jewelry     | 1740  | 1      | 223.72KB | NOT CACHED   | PARQUET | true
+| Men         | 1811  | 1      | 231.25KB | NOT CACHED   | PARQUET | true
+| Music       | 1860  | 1      | 237.90KB | NOT CACHED   | PARQUET | true
+| Shoes       | 1835  | 1      | 234.90KB | NOT CACHED   | PARQUET | true
+| Sports      | 1783  | 1      | 227.97KB | NOT CACHED   | PARQUET | true
+| Women       | 1790  | 1      | 226.27KB | NOT CACHED   | PARQUET | true
+| Total       | 17957 | 11     | 2.65MB   | 0B           |         |
++-------------+-------+--------+----------+--------------+---------+------------------
+</code></pre>
+
+    <p class="p">
+        <strong class="ph b">File format considerations:</strong>
+      </p>
+
+    <p class="p">
+      The <code class="ph codeph">COMPUTE STATS</code> statement works with 
tables created with any of the file formats supported
+      by Impala. See <a class="xref" 
href="impala_file_formats.html#file_formats">How Impala Works with Hadoop File 
Formats</a> for details about working with the
+      different file formats. The following considerations apply to <code 
class="ph codeph">COMPUTE STATS</code> depending on the
+      file format of the table.
+    </p>
+
+    <p class="p">
+      The <code class="ph codeph">COMPUTE STATS</code> statement works with 
text tables with no restrictions. These tables can be
+      created through either Impala or Hive.
+    </p>
+
+    <p class="p">
+      The <code class="ph codeph">COMPUTE STATS</code> statement works with 
Parquet tables. These tables can be created through
+      either Impala or Hive.
+    </p>
+
+    <p class="p">
+      The <code class="ph codeph">COMPUTE STATS</code> statement works with 
Avro tables without restriction in <span class="keyword">Impala 2.2</span>
+      and higher. In earlier releases, <code class="ph codeph">COMPUTE 
STATS</code> worked only for Avro tables created through Hive,
+      and required the <code class="ph codeph">CREATE TABLE</code> statement 
to use SQL-style column names and types rather than an
+      Avro-style schema specification.
+    </p>
+
+    <p class="p">
+      The <code class="ph codeph">COMPUTE STATS</code> statement works with 
RCFile tables with no restrictions. These tables can
+      be created through either Impala or Hive.
+    </p>
+
+    <p class="p">
+      The <code class="ph codeph">COMPUTE STATS</code> statement works with 
SequenceFile tables with no restrictions. These
+      tables can be created through either Impala or Hive.
+    </p>
+
+    <p class="p">
+      The <code class="ph codeph">COMPUTE STATS</code> statement works with 
partitioned tables, whether all the partitions use
+      the same file format, or some partitions are defined through <code 
class="ph codeph">ALTER TABLE</code> to use different
+      file formats.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Statement type:</strong> DDL
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Cancellation:</strong> Certain multi-stage 
statements (<code class="ph codeph">CREATE TABLE AS SELECT</code> and
+        <code class="ph codeph">COMPUTE STATS</code>) can be cancelled during 
some stages, when running <code class="ph codeph">INSERT</code>
+        or <code class="ph codeph">SELECT</code> operations internally. To 
cancel this statement, use Ctrl-C from the
+        <span class="keyword cmdname">impala-shell</span> interpreter, the 
<span class="ph uicontrol">Cancel</span> button from the
+        <span class="ph uicontrol">Watch</span> page in Hue, or <span 
class="ph uicontrol">Cancel</span> from the list of
+        in-flight queries (for a particular node) on the <span class="ph 
uicontrol">Queries</span> tab in the Impala web UI
+        (port 25000).
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Restrictions:</strong>
+      </p>
+
+    <div class="note note note_note"><span class="note__title 
notetitle">Note:</span>  Prior to Impala 1.4.0,
+          <code class="ph codeph">COMPUTE STATS</code> counted the number of
+          <code class="ph codeph">NULL</code> values in each column and 
recorded that figure
+        in the metastore database. Because Impala does not currently use the
+          <code class="ph codeph">NULL</code> count during query planning, 
Impala 1.4.0 and
+        higher speeds up the <code class="ph codeph">COMPUTE STATS</code> 
statement by
+        skipping this <code class="ph codeph">NULL</code> counting. </div>
+
+    <p class="p">
+        <strong class="ph b">Internal details:</strong>
+      </p>
+    <p class="p">
+      Behind the scenes, the <code class="ph codeph">COMPUTE STATS</code> 
statement
+      executes two statements: one to count the rows of each partition
+      in the table (or the entire table if unpartitioned) through the
+      <code class="ph codeph">COUNT(*)</code> function,
+      and another to count the approximate number of distinct values
+      in each column through the <code class="ph codeph">NDV()</code> function.
+      You might see these queries in your monitoring and diagnostic displays.
+      The same factors that affect the performance, scalability, and
+      execution of other queries (such as parallel execution, memory usage,
+      admission control, and timeouts) also apply to the queries run by the
+      <code class="ph codeph">COMPUTE STATS</code> statement.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">HDFS permissions:</strong>
+      </p>
+    <p class="p">
+      The user ID that the <span class="keyword cmdname">impalad</span> daemon 
runs under,
+      typically the <code class="ph codeph">impala</code> user, must have read
+      permission for all affected files in the source directory:
+      all files in the case of an unpartitioned table or
+      a partitioned table in the case of <code class="ph codeph">COMPUTE 
STATS</code>;
+      or all the files in partitions without incremental stats in
+      the case of <code class="ph codeph">COMPUTE INCREMENTAL STATS</code>.
+      It must also have read and execute permissions for all
+      relevant directories holding the data files.
+      (Essentially, <code class="ph codeph">COMPUTE STATS</code> requires the
+      same permissions as the underlying <code class="ph codeph">SELECT</code> 
queries it runs
+      against the table.)
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Kudu considerations:</strong>
+      </p>
+
+    <p class="p">
+      The <code class="ph codeph">COMPUTE STATS</code> statement applies to 
Kudu tables.
+      Impala does not compute the number of rows for each partition for
+      Kudu tables. Therefore, you do not need to re-run the operation when
+      you see -1 in the <code class="ph codeph"># Rows</code> column of the 
output from
+      <code class="ph codeph">SHOW TABLE STATS</code>. That column always 
shows -1 for
+      all Kudu tables.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+    <p class="p">
+      <a class="xref" href="impala_drop_stats.html#drop_stats">DROP STATS 
Statement</a>, <a class="xref" href="impala_show.html#show_table_stats">SHOW 
TABLE STATS Statement</a>,
+      <a class="xref" href="impala_show.html#show_column_stats">SHOW COLUMN 
STATS Statement</a>, <a class="xref" 
href="impala_perf_stats.html#perf_stats">Table and Column Statistics</a>
+    </p>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div 
class="parentlink"><strong>Parent topic:</strong> <a class="link" 
href="../topics/impala_langref_sql.html">Impala SQL 
Statements</a></div></div></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_compute_stats_min_sample_size.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_compute_stats_min_sample_size.html 
b/docs/build3x/html/topics/impala_compute_stats_min_sample_size.html
new file mode 100644
index 0000000..03d21e2
--- /dev/null
+++ b/docs/build3x/html/topics/impala_compute_stats_min_sample_size.html
@@ -0,0 +1,23 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; 
charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) 
Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta 
name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" 
content="../topics/impala_query_options.html"><meta name="prodname" 
content="Impala"><meta name="prodname" content="Impala"><meta name="version" 
content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta 
name="DC.Format" content="XHTML"><meta name="DC.Identifier" 
content="compute_stats_sample_min_sample_size"><link rel="stylesheet" 
type="text/css" href="../commonltr.css"><title>COMPUTE_STATS_MIN_SAMPLE_SIZE 
Query Option</title></head><body 
id="compute_stats_sample_min_sample_size"><main role="main"><article 
role="article" aria-labelledby="ariaid-title1">
+  <h1 class="title topictitle1" 
id="ariaid-title1">COMPUTE_STATS_MIN_SAMPLE_SIZE Query Option</h1>
+
+
+  <div class="body conbody">
+    <p class="p">The <code class="ph 
codeph">COMPUTE_STATS_MIN_SAMPLE_SIZE</code> query option specifies
+      the minimum number of bytes that will be scanned in <code class="ph 
codeph">COMPUTE STATS
+        TABLESAMPLE</code>, regardless of the user-supplied sampling percent.
+      This query option prevents sampling for very small tables where accurate
+      stats can be obtained cheaply without sampling because the minimum sample
+      size is required to get meaningful stats.</p>
+    <p class="p">
+        <strong class="ph b">Type:</strong> integer
+      </p>
+    <p class="p"><strong class="ph b">Default:</strong> 1GB</p>
+    <p class="p"><strong class="ph b">Added in</strong>: <span 
class="keyword">Impala 2.12</span></p>
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div 
class="parentlink"><strong>Parent topic:</strong> <a class="link" 
href="../topics/impala_query_options.html">Query Options for the SET 
Statement</a></div></div></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_concepts.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_concepts.html 
b/docs/build3x/html/topics/impala_concepts.html
new file mode 100644
index 0000000..b98e4ce
--- /dev/null
+++ b/docs/build3x/html/topics/impala_concepts.html
@@ -0,0 +1,48 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; 
charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) 
Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta 
name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" 
content="../topics/impala_components.html"><meta name="DC.Relation" 
scheme="URI" content="../topics/impala_development.html"><meta 
name="DC.Relation" scheme="URI" content="../topics/impala_hadoop.html"><meta 
name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta 
name="version" content="Impala 3.0.x"><meta name="version" content="Impala 
3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" 
content="concepts"><link rel="stylesheet" type="text/css" 
href="../commonltr.css"><title>Impala Concepts and 
Architecture</title></head><body id="concepts"><main role="main"><article 
role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Impala Concepts and 
Architecture</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+      The following sections provide background information to help you become 
productive using Impala and
+      its features. Where appropriate, the explanations include context to 
help understand how aspects of Impala
+      relate to other technologies you might already be familiar with, such as 
relational database management
+      systems and data warehouses, or other Hadoop components such as Hive, 
HDFS, and HBase.
+    </p>
+
+    <p class="p toc"></p>
+  </div>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+<nav role="navigation" class="related-links"><ul class="ullinks"><li 
class="link ulchildlink"><strong><a 
href="../topics/impala_components.html">Components of the Impala 
Server</a></strong><br></li><li class="link ulchildlink"><strong><a 
href="../topics/impala_development.html">Developing Impala 
Applications</a></strong><br></li><li class="link ulchildlink"><strong><a 
href="../topics/impala_hadoop.html">How Impala Fits Into the Hadoop 
Ecosystem</a></strong><br></li></ul></nav></article></main></body></html>

[42/51] [partial] impala git commit: [DOCS] Impala doc site update for 3.0

Reply via email to