[14/51] [partial] incubator-impala git commit: IMPALA-4181 [DOCS] Publish rendered Impala documentation to ASF site

jbapple Wed, 12 Apr 2017 11:25:56 -0700

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_perf_stats.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_perf_stats.html 
b/docs/build/html/topics/impala_perf_stats.html
new file mode 100644
index 0000000..7ad1fb0
--- /dev/null
+++ b/docs/build/html/topics/impala_perf_stats.html
@@ -0,0 +1,996 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; 
charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) 
Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta 
name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" 
content="../topics/impala_performance.html"><meta name="prodname" 
content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" 
content="Impala"><meta name="version" content="Impala 2.8.x"><meta 
name="version" content="Impala 2.8.x"><meta name="version" content="Impala 
2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" 
content="perf_stats"><link rel="stylesheet" type="text/css" 
href="../commonltr.css"><title>Table and Column Statistics</title></head><body 
id="perf_stats"><main role="main"><article role="article" 
aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Table and Column 
Statistics</h1>
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      Impala can do better optimization for complex or multi-table queries 
when it has access to statistics about
+      the volume of data and how the values are distributed. Impala uses this 
information to help parallelize and
+      distribute the work for a query. For example, optimizing join queries 
requires a way of determining if one
+      table is <span class="q">"bigger"</span> than another, which is a 
function of the number of rows and the average row size
+      for each table. The following sections describe the categories of 
statistics Impala can work
+      with, and how to produce them and keep them up to date.
+    </p>
+
+    <div class="note note note_note"><span class="note__title 
notetitle">Note:</span> 
+      <p class="p">
+        Originally, Impala relied on the Hive mechanism for collecting 
statistics, through the Hive <code class="ph codeph">ANALYZE
+        TABLE</code> statement which initiates a MapReduce job. For better 
user-friendliness and reliability,
+        Impala implements its own <code class="ph codeph">COMPUTE STATS</code> 
statement in Impala 1.2.2 and higher, along with the
+        <code class="ph codeph">DROP STATS</code>, <code class="ph 
codeph">SHOW TABLE STATS</code>, and <code class="ph codeph">SHOW COLUMN 
STATS</code>
+        statements.
+      </p>
+    </div>
+
+    <p class="p toc inpage"></p>
+  </div>
+
+  <nav role="navigation" class="related-links"><div class="familylinks"><div 
class="parentlink"><strong>Parent topic:</strong> <a class="link" 
href="../topics/impala_performance.html">Tuning Impala for 
Performance</a></div></div></nav><article class="topic concept nested1" 
aria-labelledby="perf_table_stats__table_stats" 
id="perf_stats__perf_table_stats">
+
+    <h2 class="title topictitle2" id="perf_table_stats__table_stats">Overview 
of Table Statistics</h2>
+  
+
+    <div class="body conbody">
+
+
+
+      <p class="p">
+        The Impala query planner can make use of statistics about entire 
tables and partitions.
+        This information includes physical characteristics such as the number 
of rows, number of data files,
+        the total size of the data files, and the file format. For partitioned 
tables, the numbers
+        are calculated per partition, and as totals for the whole table.
+        This metadata is stored in the metastore database, and can be updated 
by either Impala or Hive.
+        If a number is not available, the value -1 is used as a placeholder.
+        Some numbers, such as number and total sizes of data files, are always 
kept up to date because
+        they can be calculated cheaply, as part of gathering HDFS block 
metadata.
+      </p>
+
+      <p class="p">
+        The following example shows table stats for an unpartitioned Parquet 
table.
+        The values for the number and sizes of files are always available.
+        Initially, the number of rows is not known, because it requires a 
potentially expensive
+        scan through the entire table, and so that value is displayed as -1.
+        The <code class="ph codeph">COMPUTE STATS</code> statement fills in 
any unknown table stats values.
+      </p>
+
+<pre class="pre codeblock"><code>
+show table stats parquet_snappy;
++-------+--------+---------+--------------+-------------------+---------+-------------------+...
+| #Rows | #Files | Size    | Bytes Cached | Cache Replication | Format  | 
Incremental stats |...
++-------+--------+---------+--------------+-------------------+---------+-------------------+...
+| -1    | 96     | 23.35GB | NOT CACHED   | NOT CACHED        | PARQUET | 
false             |...
++-------+--------+---------+--------------+-------------------+---------+-------------------+...
+
+compute stats parquet_snappy;
++-----------------------------------------+
+| summary                                 |
++-----------------------------------------+
+| Updated 1 partition(s) and 6 column(s). |
++-----------------------------------------+
+
+
+show table stats parquet_snappy;
++------------+--------+---------+--------------+-------------------+---------+-------------------+...
+| #Rows      | #Files | Size    | Bytes Cached | Cache Replication | Format  | 
Incremental stats |...
++------------+--------+---------+--------------+-------------------+---------+-------------------+...
+| 1000000000 | 96     | 23.35GB | NOT CACHED   | NOT CACHED        | PARQUET | 
false             |...
++------------+--------+---------+--------------+-------------------+---------+-------------------+...
+</code></pre>
+
+      <p class="p">
+        Impala performs some optimizations using this metadata on its own, and 
other optimizations by
+        using a combination of table and column statistics.
+      </p>
+
+      <p class="p">
+        To check that table statistics are available for a table, and see the 
details of those statistics, use the
+        statement <code class="ph codeph">SHOW TABLE STATS <var class="keyword 
varname">table_name</var></code>. See
+        <a class="xref" href="impala_show.html#show">SHOW Statement</a> for 
details.
+      </p>
+
+      <p class="p">
+        If you use the Hive-based methods of gathering statistics, see
+        <a class="xref" 
href="https://cwiki.apache.org/confluence/display/Hive/StatsDev"; 
target="_blank">the
+        Hive wiki</a> for information about the required configuration on the 
Hive side. Where practical,
+        use the Impala <code class="ph codeph">COMPUTE STATS</code> statement 
to avoid potential configuration and scalability
+        issues with the statistics-gathering process.
+      </p>
+
+      <p class="p">
+        If you run the Hive statement <code class="ph codeph">ANALYZE TABLE 
COMPUTE STATISTICS FOR COLUMNS</code>,
+        Impala can only use the resulting column statistics if the table is 
unpartitioned.
+        Impala cannot use Hive-generated column statistics for a partitioned 
table.
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" 
aria-labelledby="perf_column_stats__column_stats" 
id="perf_stats__perf_column_stats">
+
+    <h2 class="title topictitle2" 
id="perf_column_stats__column_stats">Overview of Column Statistics</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        The Impala query planner can make use of statistics about individual 
columns when that metadata is
+        available in the metastore database. This technique is most valuable 
for columns compared across tables in
+        <a class="xref" href="impala_perf_joins.html#perf_joins">join 
queries</a>, to help estimate how many rows the query
+        will retrieve from each table. <span class="ph"> These statistics are 
also important for correlated
+        subqueries using the <code class="ph codeph">EXISTS()</code> or <code 
class="ph codeph">IN()</code> operators, which are processed
+        internally the same way as join queries.</span>
+      </p>
+
+      <p class="p">
+        The following example shows column stats for an unpartitioned Parquet 
table.
+        The values for the maximum and average sizes of some types are always 
available,
+        because those figures are constant for numeric and other fixed-size 
types.
+        Initially, the number of distinct values is not known, because it 
requires a potentially expensive
+        scan through the entire table, and so that value is displayed as -1.
+        The same applies to maximum and average sizes of variable-sized types, 
such as <code class="ph codeph">STRING</code>.
+        The <code class="ph codeph">COMPUTE STATS</code> statement fills in 
most unknown column stats values.
+        (It does not record the number of <code class="ph codeph">NULL</code> 
values, because currently Impala
+        does not use that figure for query optimization.)
+      </p>
+
+<pre class="pre codeblock"><code>
+show column stats parquet_snappy;
++-------------+----------+------------------+--------+----------+----------+
+| Column      | Type     | #Distinct Values | #Nulls | Max Size | Avg Size |
++-------------+----------+------------------+--------+----------+----------+
+| id          | BIGINT   | -1               | -1     | 8        | 8        |
+| val         | INT      | -1               | -1     | 4        | 4        |
+| zerofill    | STRING   | -1               | -1     | -1       | -1       |
+| name        | STRING   | -1               | -1     | -1       | -1       |
+| assertion   | BOOLEAN  | -1               | -1     | 1        | 1        |
+| location_id | SMALLINT | -1               | -1     | 2        | 2        |
++-------------+----------+------------------+--------+----------+----------+
+
+compute stats parquet_snappy;
++-----------------------------------------+
+| summary                                 |
++-----------------------------------------+
+| Updated 1 partition(s) and 6 column(s). |
++-----------------------------------------+
+
+show column stats parquet_snappy;
++-------------+----------+------------------+--------+----------+-------------------+
+| Column      | Type     | #Distinct Values | #Nulls | Max Size | Avg Size     
     |
++-------------+----------+------------------+--------+----------+-------------------+
+| id          | BIGINT   | 183861280        | -1     | 8        | 8            
     |
+| val         | INT      | 139017           | -1     | 4        | 4            
     |
+| zerofill    | STRING   | 101761           | -1     | 6        | 6            
     |
+| name        | STRING   | 145636240        | -1     | 22       | 
13.00020027160645 |
+| assertion   | BOOLEAN  | 2                | -1     | 1        | 1            
     |
+| location_id | SMALLINT | 339              | -1     | 2        | 2            
     |
++-------------+----------+------------------+--------+----------+-------------------+
+</code></pre>
+
+      <div class="note note note_note"><span class="note__title 
notetitle">Note:</span> 
+        <p class="p">
+          For column statistics to be effective in Impala, you also need to 
have table statistics for the
+          applicable tables, as described in <a class="xref" 
href="impala_perf_stats.html#perf_table_stats">Overview of Table 
Statistics</a>. When you use
+          the Impala <code class="ph codeph">COMPUTE STATS</code> statement, 
both table and column statistics are automatically
+          gathered at the same time, for all columns in the table.
+        </p>
+      </div>
+
+      <div class="note note note_note"><span class="note__title 
notetitle">Note:</span>  Prior to Impala 1.4.0,
+          <code class="ph codeph">COMPUTE STATS</code> counted the number of
+          <code class="ph codeph">NULL</code> values in each column and 
recorded that figure
+        in the metastore database. Because Impala does not currently use the
+          <code class="ph codeph">NULL</code> count during query planning, 
Impala 1.4.0 and
+        higher speeds up the <code class="ph codeph">COMPUTE STATS</code> 
statement by
+        skipping this <code class="ph codeph">NULL</code> counting. </div>
+
+
+
+      <p class="p">
+        To check whether column statistics are available for a particular set 
of columns, use the <code class="ph codeph">SHOW
+        COLUMN STATS <var class="keyword varname">table_name</var></code> 
statement, or check the extended
+        <code class="ph codeph">EXPLAIN</code> output for a query against that 
table that refers to those columns. See
+        <a class="xref" href="impala_show.html#show">SHOW Statement</a> and <a 
class="xref" href="impala_explain.html#explain">EXPLAIN Statement</a> for 
details.
+      </p>
+
+      <p class="p">
+        If you run the Hive statement <code class="ph codeph">ANALYZE TABLE 
COMPUTE STATISTICS FOR COLUMNS</code>,
+        Impala can only use the resulting column statistics if the table is 
unpartitioned.
+        Impala cannot use Hive-generated column statistics for a partitioned 
table.
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" 
aria-labelledby="perf_stats_partitions__stats_partitions" 
id="perf_stats__perf_stats_partitions">
+    <h2 class="title topictitle2" 
id="perf_stats_partitions__stats_partitions">How Table and Column Statistics 
Work for Partitioned Tables</h2>
+    <div class="body conbody">
+
+      <p class="p">
+        When you use Impala for <span class="q">"big data"</span>, you are 
highly likely to use partitioning
+        for your biggest tables, the ones representing data that can be 
logically divided
+        based on dates, geographic regions, or similar criteria. The table and 
column statistics
+        are especially useful for optimizing queries on such tables. For 
example, a query involving
+        one year might involve substantially more or less data than a query 
involving a different year,
+        or a range of several years. Each query might be optimized differently 
as a result.
+      </p>
+
+      <p class="p">
+        The following examples show how table and column stats work with a 
partitioned table.
+        The table for this example is partitioned by year, month, and day.
+        For simplicity, the sample data consists of 5 partitions, all from the 
same year and month.
+        Table stats are collected independently for each partition. (In fact, 
the
+        <code class="ph codeph">SHOW PARTITIONS</code> statement displays 
exactly the same information as
+        <code class="ph codeph">SHOW TABLE STATS</code> for a partitioned 
table.) Column stats apply to
+        the entire table, not to individual partitions. Because the partition 
key column values
+        are represented as HDFS directories, their characteristics are 
typically known in advance,
+        even when the values for non-key columns are shown as -1.
+      </p>
+
+<pre class="pre codeblock"><code>
+show partitions year_month_day;
++-------+-------+-----+-------+--------+---------+--------------+-------------------+---------+...
+| year  | month | day | #Rows | #Files | Size    | Bytes Cached | Cache 
Replication | Format  |...
++-------+-------+-----+-------+--------+---------+--------------+-------------------+---------+...
+| 2013  | 12    | 1   | -1    | 1      | 2.51MB  | NOT CACHED   | NOT CACHED   
     | PARQUET |...
+| 2013  | 12    | 2   | -1    | 1      | 2.53MB  | NOT CACHED   | NOT CACHED   
     | PARQUET |...
+| 2013  | 12    | 3   | -1    | 1      | 2.52MB  | NOT CACHED   | NOT CACHED   
     | PARQUET |...
+| 2013  | 12    | 4   | -1    | 1      | 2.51MB  | NOT CACHED   | NOT CACHED   
     | PARQUET |...
+| 2013  | 12    | 5   | -1    | 1      | 2.52MB  | NOT CACHED   | NOT CACHED   
     | PARQUET |...
+| Total |       |     | -1    | 5      | 12.58MB | 0B           |              
     |         |...
++-------+-------+-----+-------+--------+---------+--------------+-------------------+---------+...
+
+show table stats year_month_day;
++-------+-------+-----+-------+--------+---------+--------------+-------------------+---------+...
+| year  | month | day | #Rows | #Files | Size    | Bytes Cached | Cache 
Replication | Format  |...
++-------+-------+-----+-------+--------+---------+--------------+-------------------+---------+...
+| 2013  | 12    | 1   | -1    | 1      | 2.51MB  | NOT CACHED   | NOT CACHED   
     | PARQUET |...
+| 2013  | 12    | 2   | -1    | 1      | 2.53MB  | NOT CACHED   | NOT CACHED   
     | PARQUET |...
+| 2013  | 12    | 3   | -1    | 1      | 2.52MB  | NOT CACHED   | NOT CACHED   
     | PARQUET |...
+| 2013  | 12    | 4   | -1    | 1      | 2.51MB  | NOT CACHED   | NOT CACHED   
     | PARQUET |...
+| 2013  | 12    | 5   | -1    | 1      | 2.52MB  | NOT CACHED   | NOT CACHED   
     | PARQUET |...
+| Total |       |     | -1    | 5      | 12.58MB | 0B           |              
     |         |...
++-------+-------+-----+-------+--------+---------+--------------+-------------------+---------+...
+
+show column stats year_month_day;
++-----------+---------+------------------+--------+----------+----------+
+| Column    | Type    | #Distinct Values | #Nulls | Max Size | Avg Size |
++-----------+---------+------------------+--------+----------+----------+
+| id        | INT     | -1               | -1     | 4        | 4        |
+| val       | INT     | -1               | -1     | 4        | 4        |
+| zfill     | STRING  | -1               | -1     | -1       | -1       |
+| name      | STRING  | -1               | -1     | -1       | -1       |
+| assertion | BOOLEAN | -1               | -1     | 1        | 1        |
+| year      | INT     | 1                | 0      | 4        | 4        |
+| month     | INT     | 1                | 0      | 4        | 4        |
+| day       | INT     | 5                | 0      | 4        | 4        |
++-----------+---------+------------------+--------+----------+----------+
+
+compute stats year_month_day;
++-----------------------------------------+
+| summary                                 |
++-----------------------------------------+
+| Updated 5 partition(s) and 5 column(s). |
++-----------------------------------------+
+
+show table stats year_month_day;
++-------+-------+-----+--------+--------+---------+--------------+-------------------+---------+...
+| year  | month | day | #Rows  | #Files | Size    | Bytes Cached | Cache 
Replication | Format  |...
++-------+-------+-----+--------+--------+---------+--------------+-------------------+---------+...
+| 2013  | 12    | 1   | 93606  | 1      | 2.51MB  | NOT CACHED   | NOT CACHED  
      | PARQUET |...
+| 2013  | 12    | 2   | 94158  | 1      | 2.53MB  | NOT CACHED   | NOT CACHED  
      | PARQUET |...
+| 2013  | 12    | 3   | 94122  | 1      | 2.52MB  | NOT CACHED   | NOT CACHED  
      | PARQUET |...
+| 2013  | 12    | 4   | 93559  | 1      | 2.51MB  | NOT CACHED   | NOT CACHED  
      | PARQUET |...
+| 2013  | 12    | 5   | 93845  | 1      | 2.52MB  | NOT CACHED   | NOT CACHED  
      | PARQUET |...
+| Total |       |     | 469290 | 5      | 12.58MB | 0B           |             
      |         |...
++-------+-------+-----+--------+--------+---------+--------------+-------------------+---------+...
+
+show column stats year_month_day;
++-----------+---------+------------------+--------+----------+-------------------+
+| Column    | Type    | #Distinct Values | #Nulls | Max Size | Avg Size        
  |
++-----------+---------+------------------+--------+----------+-------------------+
+| id        | INT     | 511129           | -1     | 4        | 4               
  |
+| val       | INT     | 364853           | -1     | 4        | 4               
  |
+| zfill     | STRING  | 311430           | -1     | 6        | 6               
  |
+| name      | STRING  | 471975           | -1     | 22       | 
13.00160026550293 |
+| assertion | BOOLEAN | 2                | -1     | 1        | 1               
  |
+| year      | INT     | 1                | 0      | 4        | 4               
  |
+| month     | INT     | 1                | 0      | 4        | 4               
  |
+| day       | INT     | 5                | 0      | 4        | 4               
  |
++-----------+---------+------------------+--------+----------+-------------------+
+</code></pre>
+
+      <div class="note note note_note"><span class="note__title 
notetitle">Note:</span> 
+        Partitioned tables can grow so large that scanning the entire table, 
as the <code class="ph codeph">COMPUTE STATS</code>
+        statement does, is impractical just to update the statistics for a new 
partition. The standard
+        <code class="ph codeph">COMPUTE STATS</code> statement might take 
hours, or even days. That situation is where you switch
+        to using incremental statistics, a feature available in <span 
class="keyword">Impala 2.1</span> and higher.
+        See <a class="xref" 
href="impala_perf_stats.html#perf_stats_incremental">Overview of Incremental 
Statistics</a> for details about this feature
+        and the <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> 
syntax.
+      </div>
+
+      <p class="p">
+        If you run the Hive statement <code class="ph codeph">ANALYZE TABLE 
COMPUTE STATISTICS FOR COLUMNS</code>,
+        Impala can only use the resulting column statistics if the table is 
unpartitioned.
+        Impala cannot use Hive-generated column statistics for a partitioned 
table.
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" 
aria-labelledby="perf_stats_incremental__incremental_stats" 
id="perf_stats__perf_stats_incremental">
+
+    <h2 class="title topictitle2" 
id="perf_stats_incremental__incremental_stats">Overview of Incremental 
Statistics</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        In Impala 2.1.0 and higher, you can use the syntax <code class="ph 
codeph">COMPUTE INCREMENTAL STATS</code> and
+        <code class="ph codeph">DROP INCREMENTAL STATS</code>. The <code 
class="ph codeph">INCREMENTAL</code> clauses work with incremental
+        statistics, a specialized feature for partitioned tables that are 
large or frequently updated with new
+        partitions.
+      </p>
+
+      <p class="p">
+        When you compute incremental statistics for a partitioned table, by 
default Impala only processes those
+        partitions that do not yet have incremental statistics. By processing 
only newly added partitions, you can
+        keep statistics up to date for large partitioned tables, without 
incurring the overhead of reprocessing the
+        entire table each time.
+      </p>
+
+      <p class="p">
+        You can also compute or drop statistics for a single partition by 
including a <code class="ph codeph">PARTITION</code>
+        clause in the <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> 
or <code class="ph codeph">DROP INCREMENTAL STATS</code>
+        statement.
+      </p>
+
+      <p class="p">
+        The metadata for incremental statistics is handled differently from 
the original style of statistics:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+            If you have an existing partitioned table for which you have 
already computed statistics, issuing
+            <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> without a 
partition clause causes Impala to rescan the
+            entire table. Once the incremental statistics are computed, any 
future <code class="ph codeph">COMPUTE INCREMENTAL
+            STATS</code> statements only scan any new partitions and any 
partitions where you performed
+            <code class="ph codeph">DROP INCREMENTAL STATS</code>.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            The <code class="ph codeph">SHOW TABLE STATS</code> and <code 
class="ph codeph">SHOW PARTITIONS</code> statements now include an
+            additional column showing whether incremental statistics are 
available for each column. A partition
+            could already be covered by the original type of statistics based 
on a prior <code class="ph codeph">COMPUTE
+            STATS</code> statement, as indicated by a value other than <code 
class="ph codeph">-1</code> under the
+            <code class="ph codeph">#Rows</code> column. Impala query planning 
uses either kind of statistics when available.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> takes 
more time than <code class="ph codeph">COMPUTE STATS</code> for the
+            same volume of data. Therefore it is most suitable for tables with 
large data volume where new
+            partitions are added frequently, making it impractical to run a 
full <code class="ph codeph">COMPUTE STATS</code>
+            operation for each new partition. For unpartitioned tables, or 
partitioned tables that are loaded once
+            and not updated with new partitions, use the original <code 
class="ph codeph">COMPUTE STATS</code> syntax.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> uses some 
memory in the <span class="keyword cmdname">catalogd</span> process,
+            proportional to the number of partitions and number of columns in 
the applicable table. The memory
+            overhead is approximately 400 bytes for each column in each 
partition. This memory is reserved in the
+            <span class="keyword cmdname">catalogd</span> daemon, the <span 
class="keyword cmdname">statestored</span> daemon, and in each instance of
+            the <span class="keyword cmdname">impalad</span> daemon.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            In cases where new files are added to an existing partition, issue 
a <code class="ph codeph">REFRESH</code> statement
+            for the table, followed by a <code class="ph codeph">DROP 
INCREMENTAL STATS</code> and <code class="ph codeph">COMPUTE INCREMENTAL
+            STATS</code> sequence for the changed partition.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            The <code class="ph codeph">DROP INCREMENTAL STATS</code> 
statement operates only on a single partition at a time. To
+            remove statistics (whether incremental or not) from all partitions 
of a table, issue a <code class="ph codeph">DROP
+            STATS</code> statement with no <code class="ph 
codeph">INCREMENTAL</code> or <code class="ph codeph">PARTITION</code> clauses.
+          </p>
+        </li>
+      </ul>
+
+      <p class="p">
+        The following considerations apply to incremental statistics when the 
structure of an existing table is
+        changed (known as <dfn class="term">schema evolution</dfn>):
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+            If you use an <code class="ph codeph">ALTER TABLE</code> statement 
to drop a column, the existing statistics remain
+            valid and <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> 
does not rescan any partitions.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            If you use an <code class="ph codeph">ALTER TABLE</code> statement 
to add a column, Impala rescans all partitions and
+            fills in the appropriate column-level values the next time you run 
<code class="ph codeph">COMPUTE INCREMENTAL
+            STATS</code>.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            If you use an <code class="ph codeph">ALTER TABLE</code> statement 
to change the data type of a column, Impala
+            rescans all partitions and fills in the appropriate column-level 
values the next time you run
+            <code class="ph codeph">COMPUTE INCREMENTAL STATS</code>.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            If you use an <code class="ph codeph">ALTER TABLE</code> statement 
to change the file format of a table, the existing
+            statistics remain valid and a subsequent <code class="ph 
codeph">COMPUTE INCREMENTAL STATS</code> does not rescan any
+            partitions.
+          </p>
+        </li>
+      </ul>
+
+      <p class="p">
+        See <a class="xref" 
href="impala_compute_stats.html#compute_stats">COMPUTE STATS Statement</a> and
+        <a class="xref" href="impala_drop_stats.html#drop_stats">DROP STATS 
Statement</a> for syntax details.
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title6" 
id="perf_stats__perf_stats_computing">
+    <h2 class="title topictitle2" id="ariaid-title6">Generating Table and 
Column Statistics (COMPUTE STATS Statement)</h2>
+    <div class="body conbody">
+
+      <p class="p">
+        To gather table statistics after loading data into a table or 
partition, you typically use the
+        <code class="ph codeph">COMPUTE STATS</code> statement. This statement 
is available in Impala 1.2.2 and higher.
+        It gathers both table statistics and column statistics for all columns 
in a single operation.
+        For large partitioned tables, where you frequently need to update 
statistics and it is impractical
+        to scan the entire table each time, use the syntax <code class="ph 
codeph">COMPUTE INCREMENTAL STATS</code>,
+        which is available in <span class="keyword">Impala 2.1</span> and 
higher.
+      </p>
+
+      <p class="p">
+        If you use Hive as part of your ETL workflow, you can also use Hive to 
generate table and
+        column statistics. You might need to do extra configuration within 
Hive itself, the metastore,
+        or even set up a separate database to hold Hive-generated statistics. 
You might need to run
+        multiple statements to generate all the necessary statistics. 
Therefore, prefer the
+        Impala <code class="ph codeph">COMPUTE STATS</code> statement where 
that technique is practical.
+        For details about collecting statistics through Hive, see
+        <a class="xref" 
href="https://cwiki.apache.org/confluence/display/Hive/StatsDev"; 
target="_blank">the Hive wiki</a>.
+      </p>
+
+      <p class="p">
+        If you run the Hive statement <code class="ph codeph">ANALYZE TABLE 
COMPUTE STATISTICS FOR COLUMNS</code>,
+        Impala can only use the resulting column statistics if the table is 
unpartitioned.
+        Impala cannot use Hive-generated column statistics for a partitioned 
table.
+      </p>
+
+
+
+      <p class="p">
+
+
+        For your very largest tables, you might find that <code class="ph 
codeph">COMPUTE STATS</code> or even <code class="ph codeph">COMPUTE 
INCREMENTAL STATS</code>
+        take so long to scan the data that it is impractical to use them 
regularly. In such a case, after adding a partition or inserting new data,
+        you can update just the number of rows property through an <code 
class="ph codeph">ALTER TABLE</code> statement.
+        See <a class="xref" 
href="impala_perf_stats.html#perf_table_stats_manual">Setting the NUMROWS Value 
Manually through ALTER TABLE</a> for details.
+        Because the column statistics might be left in a stale state, do not 
use this technique as a replacement
+        for <code class="ph codeph">COMPUTE STATS</code>. Only use this 
technique if all other means of collecting statistics are impractical, or as a
+        low-overhead operation that you run in between periodic <code 
class="ph codeph">COMPUTE STATS</code> or <code class="ph codeph">COMPUTE 
INCREMENTAL STATS</code> operations.
+      </p>
+
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title7" 
id="perf_stats__perf_stats_checking">
+
+    <h2 class="title topictitle2" id="ariaid-title7">Detecting Missing 
Statistics</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        You can check whether a specific table has statistics using the <code 
class="ph codeph">SHOW TABLE STATS</code> statement
+        (for any table) or the <code class="ph codeph">SHOW PARTITIONS</code> 
statement (for a partitioned table). Both
+        statements display the same information. If a table or a partition 
does not have any statistics, the
+        <code class="ph codeph">#Rows</code> field contains <code class="ph 
codeph">-1</code>. Once you compute statistics for the table or
+        partition, the <code class="ph codeph">#Rows</code> field changes to 
an accurate value.
+      </p>
+
+      <p class="p">
+        The following example shows a table that initially does not have any 
statistics. The <code class="ph codeph">SHOW TABLE
+        STATS</code> statement displays different values for <code class="ph 
codeph">#Rows</code> before and after the
+        <code class="ph codeph">COMPUTE STATS</code> operation.
+      </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; create table no_stats 
(x int);
+[localhost:21000] &gt; show table stats no_stats;
++-------+--------+------+--------------+--------+-------------------+
+| #Rows | #Files | Size | Bytes Cached | Format | Incremental stats |
++-------+--------+------+--------------+--------+-------------------+
+| -1    | 0      | 0B   | NOT CACHED   | TEXT   | false             |
++-------+--------+------+--------------+--------+-------------------+
+[localhost:21000] &gt; compute stats no_stats;
++-----------------------------------------+
+| summary                                 |
++-----------------------------------------+
+| Updated 1 partition(s) and 1 column(s). |
++-----------------------------------------+
+[localhost:21000] &gt; show table stats no_stats;
++-------+--------+------+--------------+--------+-------------------+
+| #Rows | #Files | Size | Bytes Cached | Format | Incremental stats |
++-------+--------+------+--------------+--------+-------------------+
+| 0     | 0      | 0B   | NOT CACHED   | TEXT   | false             |
++-------+--------+------+--------------+--------+-------------------+
+</code></pre>
+
+      <p class="p">
+        The following example shows a similar progression with a partitioned 
table. Initially,
+        <code class="ph codeph">#Rows</code> is <code class="ph 
codeph">-1</code>. After a <code class="ph codeph">COMPUTE STATS</code> 
operation,
+        <code class="ph codeph">#Rows</code> changes to an accurate value. Any 
newly added partition starts with no statistics,
+        meaning that you must collect statistics after adding a new partition.
+      </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; create table 
no_stats_partitioned (x int) partitioned by (year smallint);
+[localhost:21000] &gt; show table stats no_stats_partitioned;
++-------+-------+--------+------+--------------+--------+-------------------+
+| year  | #Rows | #Files | Size | Bytes Cached | Format | Incremental stats |
++-------+-------+--------+------+--------------+--------+-------------------+
+| Total | -1    | 0      | 0B   | 0B           |        |                   |
++-------+-------+--------+------+--------------+--------+-------------------+
+[localhost:21000] &gt; show partitions no_stats_partitioned;
++-------+-------+--------+------+--------------+--------+-------------------+
+| year  | #Rows | #Files | Size | Bytes Cached | Format | Incremental stats |
++-------+-------+--------+------+--------------+--------+-------------------+
+| Total | -1    | 0      | 0B   | 0B           |        |                   |
++-------+-------+--------+------+--------------+--------+-------------------+
+[localhost:21000] &gt; alter table no_stats_partitioned add partition 
(year=2013);
+[localhost:21000] &gt; compute stats no_stats_partitioned;
++-----------------------------------------+
+| summary                                 |
++-----------------------------------------+
+| Updated 1 partition(s) and 1 column(s). |
++-----------------------------------------+
+[localhost:21000] &gt; alter table no_stats_partitioned add partition 
(year=2014);
+[localhost:21000] &gt; show partitions no_stats_partitioned;
++-------+-------+--------+------+--------------+--------+-------------------+
+| year  | #Rows | #Files | Size | Bytes Cached | Format | Incremental stats |
++-------+-------+--------+------+--------------+--------+-------------------+
+| 2013  | 0     | 0      | 0B   | NOT CACHED   | TEXT   | false             |
+| 2014  | -1    | 0      | 0B   | NOT CACHED   | TEXT   | false             |
+| Total | 0     | 0      | 0B   | 0B           |        |                   |
++-------+-------+--------+------+--------------+--------+-------------------+
+</code></pre>
+
+      <div class="note note note_note"><span class="note__title 
notetitle">Note:</span> 
+        Because the default <code class="ph codeph">COMPUTE STATS</code> 
statement creates and updates statistics for all
+        partitions in a table, if you expect to frequently add new partitions, 
use the <code class="ph codeph">COMPUTE INCREMENTAL
+        STATS</code> syntax instead, which lets you compute stats for a single 
specified partition, or only for
+        those partitions that do not already have incremental stats.
+      </div>
+
+      <p class="p">
+        If checking each individual table is impractical, due to a large 
number of tables or views that hide the
+        underlying base tables, you can also check for missing statistics for 
a particular query. Use the
+        <code class="ph codeph">EXPLAIN</code> statement to preview query 
efficiency before actually running the query. Use the
+        query profile output available through the <code class="ph 
codeph">PROFILE</code> command in
+        <span class="keyword cmdname">impala-shell</span> or the web UI to 
verify query execution and timing after running the query.
+        Both the <code class="ph codeph">EXPLAIN</code> plan and the <code 
class="ph codeph">PROFILE</code> output display a warning if any
+        tables or partitions involved in the query do not have statistics.
+      </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; create table no_stats 
(x int);
+[localhost:21000] &gt; explain select count(*) from no_stats;
++------------------------------------------------------------------------------------+
+| Explain String                                                               
      |
++------------------------------------------------------------------------------------+
+| Estimated Per-Host Requirements: Memory=10.00MB VCores=1                     
      |
+| WARNING: The following tables are missing relevant table and/or column 
statistics. |
+| incremental_stats.no_stats                                                   
      |
+|                                                                              
      |
+| 03:AGGREGATE [FINALIZE]                                                      
      |
+| |  output: count:merge(*)                                                    
      |
+| |                                                                            
      |
+| 02:EXCHANGE [UNPARTITIONED]                                                  
      |
+| |                                                                            
      |
+| 01:AGGREGATE                                                                 
      |
+| |  output: count(*)                                                          
      |
+| |                                                                            
      |
+| 00:SCAN HDFS [incremental_stats.no_stats]                                    
      |
+|    partitions=1/1 files=0 size=0B                                            
      |
++------------------------------------------------------------------------------------+
+</code></pre>
+
+      <p class="p">
+        Because Impala uses the <dfn class="term">partition pruning</dfn> 
technique when possible to only evaluate certain
+        partitions, if you have a partitioned table with statistics for some 
partitions and not others, whether or
+        not the <code class="ph codeph">EXPLAIN</code> statement shows the 
warning depends on the actual partitions used by the
+        query. For example, you might see warnings or not for different 
queries against the same table:
+      </p>
+
+<pre class="pre codeblock"><code>-- No warning because all the partitions for 
the year 2012 have stats.
+EXPLAIN SELECT ... FROM t1 WHERE year = 2012;
+
+-- Missing stats warning because one or more partitions in this range
+-- do not have stats.
+EXPLAIN SELECT ... FROM t1 WHERE year BETWEEN 2006 AND 2009;
+</code></pre>
+
+      <p class="p">
+        To confirm if any partitions at all in the table are missing 
statistics, you might explain a query that
+        scans the entire table, such as <code class="ph codeph">SELECT 
COUNT(*) FROM <var class="keyword varname">table_name</var></code>.
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title8" 
id="perf_stats__perf_stats_collecting">
+
+    <h2 class="title topictitle2" id="ariaid-title8">Keeping Statistics Up to 
Date</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        When the contents of a table or partition change significantly, 
recompute the stats for the relevant table
+        or partition. The degree of change that qualifies as <span 
class="q">"significant"</span> varies, depending on the absolute
+        and relative sizes of the tables. Typically, if you add more than 30% 
more data to a table, it is
+        worthwhile to recompute stats, because the differences in number of 
rows and number of distinct values
+        might cause Impala to choose a different join order when that table is 
used in join queries. This guideline
+        is most important for the largest tables. For example, adding 30% new 
data to a table containing 1 TB has a
+        greater effect on join order than adding 30% to a table containing 
only a few megabytes, and the larger
+        table has a greater effect on query performance if Impala chooses a 
suboptimal join order as a result of
+        outdated statistics.
+      </p>
+
+      <p class="p">
+        If you reload a complete new set of data for a table, but the number 
of rows and number of distinct values
+        for each column is relatively unchanged from before, you do not need 
to recompute stats for the table.
+      </p>
+
+      <p class="p">
+        If the statistics for a table are out of date, and the table's large 
size makes it impractical to recompute
+        new stats immediately, you can use the <code class="ph codeph">DROP 
STATS</code> statement to remove the obsolete
+        statistics, making it easier to identify tables that need a new <code 
class="ph codeph">COMPUTE STATS</code> operation.
+      </p>
+
+      <p class="p">
+        For a large partitioned table, consider using the incremental stats 
feature available in Impala 2.1.0 and
+        higher, as explained in <a class="xref" 
href="impala_perf_stats.html#perf_stats_incremental">Overview of Incremental 
Statistics</a>. If you add a new
+        partition to a table, it is worthwhile to recompute incremental stats, 
because the operation only scans the
+        data for that one new partition.
+      </p>
+    </div>
+  </article>
+
+
+
+  
+
+
+
+  
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title9" 
id="perf_stats__perf_table_stats_manual">
+
+    <h2 class="title topictitle2" id="ariaid-title9">Setting the NUMROWS Value 
Manually through ALTER TABLE</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        The most crucial piece of data in all the statistics is the number of 
rows in the table (for an
+        unpartitioned or partitioned table) and for each partition (for a 
partitioned table). The <code class="ph codeph">COMPUTE STATS</code>
+        statement always gathers statistics about all columns, as well as 
overall table statistics. If it is not
+        practical to do a full <code class="ph codeph">COMPUTE STATS</code> or 
<code class="ph codeph">COMPUTE INCREMENTAL STATS</code>
+        operation after adding a partition or inserting data, or if you can 
see that Impala would produce a more
+        efficient plan if the number of rows was different, you can manually 
set the number of rows through an
+        <code class="ph codeph">ALTER TABLE</code> statement:
+      </p>
+
+<pre class="pre codeblock"><code>
+-- Set total number of rows. Applies to both unpartitioned and partitioned 
tables.
+alter table <var class="keyword varname">table_name</var> set 
tblproperties('numRows'='<var class="keyword varname">new_value</var>', 
'STATS_GENERATED_VIA_STATS_TASK'='true');
+
+-- Set total number of rows for a specific partition. Applies to partitioned 
tables only.
+-- You must specify all the partition key columns in the PARTITION clause.
+alter table <var class="keyword varname">table_name</var> partition (<var 
class="keyword varname">keycol1</var>=<var class="keyword 
varname">val1</var>,<var class="keyword varname">keycol2</var>=<var 
class="keyword varname">val2</var>...) set tblproperties('numRows'='<var 
class="keyword varname">new_value</var>', 
'STATS_GENERATED_VIA_STATS_TASK'='true');
+</code></pre>
+
+      <p class="p">
+        This statement avoids re-scanning any data files. (The requirement to 
include the <code class="ph codeph">STATS_GENERATED_VIA_STATS_TASK</code> 
property is relatively new, as a
+        result of the issue <a class="xref" 
href="https://issues.apache.org/jira/browse/HIVE-8648"; 
target="_blank">HIVE-8648</a>
+        for the Hive metastore.)
+      </p>
+
+<pre class="pre codeblock"><code>create table analysis_data stored as parquet 
as select * from raw_data;
+Inserted 1000000000 rows in 181.98s
+compute stats analysis_data;
+insert into analysis_data select * from smaller_table_we_forgot_before;
+Inserted 1000000 rows in 15.32s
+-- Now there are 1001000000 rows. We can update this single data point in the 
stats.
+alter table analysis_data set tblproperties('numRows'='1001000000', 
'STATS_GENERATED_VIA_STATS_TASK'='true');</code></pre>
+
+      <p class="p">
+        For a partitioned table, update both the per-partition number of rows 
and the number of rows for the whole
+        table:
+      </p>
+
+<pre class="pre codeblock"><code>-- If the table originally contained 1 
million rows, and we add another partition with 30 thousand rows,
+-- change the numRows property for the partition and the overall table.
+alter table partitioned_data partition(year=2009, month=4) set tblproperties 
('numRows'='30000', 'STATS_GENERATED_VIA_STATS_TASK'='true');
+alter table partitioned_data set tblproperties ('numRows'='1030000', 
'STATS_GENERATED_VIA_STATS_TASK'='true');</code></pre>
+
+      <p class="p">
+        In practice, the <code class="ph codeph">COMPUTE STATS</code> 
statement, or <code class="ph codeph">COMPUTE INCREMENTAL STATS</code>
+        for a partitioned table, should be fast and convenient enough that 
this technique is only useful for the very
+        largest partitioned tables.
+        
+        
+        Because the column statistics might be left in a stale state, do not 
use this technique as a replacement
+        for <code class="ph codeph">COMPUTE STATS</code>. Only use this 
technique if all other means of collecting statistics are impractical, or as a
+        low-overhead operation that you run in between periodic <code 
class="ph codeph">COMPUTE STATS</code> or <code class="ph codeph">COMPUTE 
INCREMENTAL STATS</code> operations.
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title10" 
id="perf_stats__perf_column_stats_manual">
+    <h2 class="title topictitle2" id="ariaid-title10">Setting Column Stats 
Manually through ALTER TABLE</h2>
+    <div class="body conbody">
+      <p class="p">
+        In <span class="keyword">Impala 2.6</span> and higher, you can also 
use the <code class="ph codeph">SET COLUMN STATS</code>
+        clause of <code class="ph codeph">ALTER TABLE</code> to manually set 
or change column statistics.
+        Only use this technique in cases where it is impractical to run
+        <code class="ph codeph">COMPUTE STATS</code> or <code class="ph 
codeph">COMPUTE INCREMENTAL STATS</code>
+        frequently enough to keep up with data changes for a huge table.
+      </p>
+      <div class="p">
+        You specify a case-insensitive symbolic name for the kind of 
statistics:
+        <code class="ph codeph">numDVs</code>, <code class="ph 
codeph">numNulls</code>, <code class="ph codeph">avgSize</code>, <code 
class="ph codeph">maxSize</code>.
+        The key names and values are both quoted. This operation applies to an 
entire table,
+        not a specific partition. For example:
+<pre class="pre codeblock"><code>
+create table t1 (x int, s string);
+insert into t1 values (1, 'one'), (2, 'two'), (2, 'deux');
+show column stats t1;
++--------+--------+------------------+--------+----------+----------+
+| Column | Type   | #Distinct Values | #Nulls | Max Size | Avg Size |
++--------+--------+------------------+--------+----------+----------+
+| x      | INT    | -1               | -1     | 4        | 4        |
+| s      | STRING | -1               | -1     | -1       | -1       |
++--------+--------+------------------+--------+----------+----------+
+alter table t1 set column stats x ('numDVs'='2','numNulls'='0');
+alter table t1 set column stats s ('numdvs'='3','maxsize'='4');
+show column stats t1;
++--------+--------+------------------+--------+----------+----------+
+| Column | Type   | #Distinct Values | #Nulls | Max Size | Avg Size |
++--------+--------+------------------+--------+----------+----------+
+| x      | INT    | 2                | 0      | 4        | 4        |
+| s      | STRING | 3                | -1     | 4        | -1       |
++--------+--------+------------------+--------+----------+----------+
+</code></pre>
+      </div>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title11" 
id="perf_stats__perf_stats_examples">
+
+    <h2 class="title topictitle2" id="ariaid-title11">Examples of Using Table 
and Column Statistics with Impala</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        The following examples walk through a sequence of <code class="ph 
codeph">SHOW TABLE STATS</code>, <code class="ph codeph">SHOW COLUMN
+        STATS</code>, <code class="ph codeph">ALTER TABLE</code>, and <code 
class="ph codeph">SELECT</code> and <code class="ph codeph">INSERT</code>
+        statements to illustrate various aspects of how Impala uses statistics 
to help optimize queries.
+      </p>
+
+      <p class="p">
+        This example shows table and column statistics for the <code class="ph 
codeph">STORE</code> column used in the
+        <a class="xref" href="http://www.tpc.org/tpcds/"; 
target="_blank">TPC-DS benchmarks for decision
+        support</a> systems. It is a tiny table holding data for 12 stores. 
Initially, before any statistics are
+        gathered by a <code class="ph codeph">COMPUTE STATS</code> statement, 
most of the numeric fields show placeholder values
+        of -1, indicating that the figures are unknown. The figures that are 
filled in are values that are easily
+        countable or deducible at the physical level, such as the number of 
files, total data size of the files,
+        and the maximum and average sizes for data types that have a constant 
size such as <code class="ph codeph">INT</code>,
+        <code class="ph codeph">FLOAT</code>, and <code class="ph 
codeph">TIMESTAMP</code>.
+      </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; show table stats store;
++-------+--------+--------+--------+
+| #Rows | #Files | Size   | Format |
++-------+--------+--------+--------+
+| -1    | 1      | 3.08KB | TEXT   |
++-------+--------+--------+--------+
+Returned 1 row(s) in 0.03s
+[localhost:21000] &gt; show column stats store;
++--------------------+-----------+------------------+--------+----------+----------+
+| Column             | Type      | #Distinct Values | #Nulls | Max Size | Avg 
Size |
++--------------------+-----------+------------------+--------+----------+----------+
+| s_store_sk         | INT       | -1               | -1     | 4        | 4    
    |
+| s_store_id         | STRING    | -1               | -1     | -1       | -1   
    |
+| s_rec_start_date   | TIMESTAMP | -1               | -1     | 16       | 16   
    |
+| s_rec_end_date     | TIMESTAMP | -1               | -1     | 16       | 16   
    |
+| s_closed_date_sk   | INT       | -1               | -1     | 4        | 4    
    |
+| s_store_name       | STRING    | -1               | -1     | -1       | -1   
    |
+| s_number_employees | INT       | -1               | -1     | 4        | 4    
    |
+| s_floor_space      | INT       | -1               | -1     | 4        | 4    
    |
+| s_hours            | STRING    | -1               | -1     | -1       | -1   
    |
+| s_manager          | STRING    | -1               | -1     | -1       | -1   
    |
+| s_market_id        | INT       | -1               | -1     | 4        | 4    
    |
+| s_geography_class  | STRING    | -1               | -1     | -1       | -1   
    |
+| s_market_desc      | STRING    | -1               | -1     | -1       | -1   
    |
+| s_market_manager   | STRING    | -1               | -1     | -1       | -1   
    |
+| s_division_id      | INT       | -1               | -1     | 4        | 4    
    |
+| s_division_name    | STRING    | -1               | -1     | -1       | -1   
    |
+| s_company_id       | INT       | -1               | -1     | 4        | 4    
    |
+| s_company_name     | STRING    | -1               | -1     | -1       | -1   
    |
+| s_street_number    | STRING    | -1               | -1     | -1       | -1   
    |
+| s_street_name      | STRING    | -1               | -1     | -1       | -1   
    |
+| s_street_type      | STRING    | -1               | -1     | -1       | -1   
    |
+| s_suite_number     | STRING    | -1               | -1     | -1       | -1   
    |
+| s_city             | STRING    | -1               | -1     | -1       | -1   
    |
+| s_county           | STRING    | -1               | -1     | -1       | -1   
    |
+| s_state            | STRING    | -1               | -1     | -1       | -1   
    |
+| s_zip              | STRING    | -1               | -1     | -1       | -1   
    |
+| s_country          | STRING    | -1               | -1     | -1       | -1   
    |
+| s_gmt_offset       | FLOAT     | -1               | -1     | 4        | 4    
    |
+| s_tax_percentage   | FLOAT     | -1               | -1     | 4        | 4    
    |
++--------------------+-----------+------------------+--------+----------+----------+
+Returned 29 row(s) in 0.04s</code></pre>
+
+      <p class="p">
+        With the Hive <code class="ph codeph">ANALYZE TABLE</code> statement 
for column statistics, you had to specify each
+        column for which to gather statistics. The Impala <code class="ph 
codeph">COMPUTE STATS</code> statement automatically
+        gathers statistics for all columns, because it reads through the 
entire table relatively quickly and can
+        efficiently compute the values for all the columns. This example shows 
how after running the
+        <code class="ph codeph">COMPUTE STATS</code> statement, statistics are 
filled in for both the table and all its columns:
+      </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; compute stats store;
++------------------------------------------+
+| summary                                  |
++------------------------------------------+
+| Updated 1 partition(s) and 29 column(s). |
++------------------------------------------+
+Returned 1 row(s) in 1.88s
+[localhost:21000] &gt; show table stats store;
++-------+--------+--------+--------+
+| #Rows | #Files | Size   | Format |
++-------+--------+--------+--------+
+| 12    | 1      | 3.08KB | TEXT   |
++-------+--------+--------+--------+
+Returned 1 row(s) in 0.02s
+[localhost:21000] &gt; show column stats store;
++--------------------+-----------+------------------+--------+----------+-------------------+
+| Column             | Type      | #Distinct Values | #Nulls | Max Size | Avg 
Size          |
++--------------------+-----------+------------------+--------+----------+-------------------+
+| s_store_sk         | INT       | 12               | -1     | 4        | 4    
             |
+| s_store_id         | STRING    | 6                | -1     | 16       | 16   
             |
+| s_rec_start_date   | TIMESTAMP | 4                | -1     | 16       | 16   
             |
+| s_rec_end_date     | TIMESTAMP | 3                | -1     | 16       | 16   
             |
+| s_closed_date_sk   | INT       | 3                | -1     | 4        | 4    
             |
+| s_store_name       | STRING    | 8                | -1     | 5        | 4.25 
             |
+| s_number_employees | INT       | 9                | -1     | 4        | 4    
             |
+| s_floor_space      | INT       | 10               | -1     | 4        | 4    
             |
+| s_hours            | STRING    | 2                | -1     | 8        | 
7.083300113677979 |
+| s_manager          | STRING    | 7                | -1     | 15       | 12   
             |
+| s_market_id        | INT       | 7                | -1     | 4        | 4    
             |
+| s_geography_class  | STRING    | 1                | -1     | 7        | 7    
             |
+| s_market_desc      | STRING    | 10               | -1     | 94       | 55.5 
             |
+| s_market_manager   | STRING    | 7                | -1     | 16       | 14   
             |
+| s_division_id      | INT       | 1                | -1     | 4        | 4    
             |
+| s_division_name    | STRING    | 1                | -1     | 7        | 7    
             |
+| s_company_id       | INT       | 1                | -1     | 4        | 4    
             |
+| s_company_name     | STRING    | 1                | -1     | 7        | 7    
             |
+| s_street_number    | STRING    | 9                | -1     | 3        | 
2.833300113677979 |
+| s_street_name      | STRING    | 12               | -1     | 11       | 
6.583300113677979 |
+| s_street_type      | STRING    | 8                | -1     | 9        | 
4.833300113677979 |
+| s_suite_number     | STRING    | 11               | -1     | 9        | 8.25 
             |
+| s_city             | STRING    | 2                | -1     | 8        | 6.5  
             |
+| s_county           | STRING    | 1                | -1     | 17       | 17   
             |
+| s_state            | STRING    | 1                | -1     | 2        | 2    
             |
+| s_zip              | STRING    | 2                | -1     | 5        | 5    
             |
+| s_country          | STRING    | 1                | -1     | 13       | 13   
             |
+| s_gmt_offset       | FLOAT     | 1                | -1     | 4        | 4    
             |
+| s_tax_percentage   | FLOAT     | 5                | -1     | 4        | 4    
             |
++--------------------+-----------+------------------+--------+----------+-------------------+
+Returned 29 row(s) in 0.04s</code></pre>
+
+      <p class="p">
+        The following example shows how statistics are represented for a 
partitioned table. In this case, we have
+        set up a table to hold the world's most trivial census data, a single 
<code class="ph codeph">STRING</code> field,
+        partitioned by a <code class="ph codeph">YEAR</code> column. The table 
statistics include a separate entry for each
+        partition, plus final totals for the numeric fields. The column 
statistics include some easily deducible
+        facts for the partitioning column, such as the number of distinct 
values (the number of partition
+        subdirectories).
+
+      </p>
+
+<pre class="pre codeblock"><code>localhost:21000] &gt; describe census;
++------+----------+---------+
+| name | type     | comment |
++------+----------+---------+
+| name | string   |         |
+| year | smallint |         |
++------+----------+---------+
+Returned 2 row(s) in 0.02s
+[localhost:21000] &gt; show table stats census;
++-------+-------+--------+------+---------+
+| year  | #Rows | #Files | Size | Format  |
++-------+-------+--------+------+---------+
+| 2000  | -1    | 0      | 0B   | TEXT    |
+| 2004  | -1    | 0      | 0B   | TEXT    |
+| 2008  | -1    | 0      | 0B   | TEXT    |
+| 2010  | -1    | 0      | 0B   | TEXT    |
+| 2011  | 0     | 1      | 22B  | TEXT    |
+| 2012  | -1    | 1      | 22B  | TEXT    |
+| 2013  | -1    | 1      | 231B | PARQUET |
+| Total | 0     | 3      | 275B |         |
++-------+-------+--------+------+---------+
+Returned 8 row(s) in 0.02s
+[localhost:21000] &gt; show column stats census;
++--------+----------+------------------+--------+----------+----------+
+| Column | Type     | #Distinct Values | #Nulls | Max Size | Avg Size |
++--------+----------+------------------+--------+----------+----------+
+| name   | STRING   | -1               | -1     | -1       | -1       |
+| year   | SMALLINT | 7                | -1     | 2        | 2        |
++--------+----------+------------------+--------+----------+----------+
+Returned 2 row(s) in 0.02s</code></pre>
+
+      <p class="p">
+        The following example shows how the statistics are filled in by a 
<code class="ph codeph">COMPUTE STATS</code> statement
+        in Impala.
+      </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; compute stats census;
++-----------------------------------------+
+| summary                                 |
++-----------------------------------------+
+| Updated 3 partition(s) and 1 column(s). |
++-----------------------------------------+
+Returned 1 row(s) in 2.16s
+[localhost:21000] &gt; show table stats census;
++-------+-------+--------+------+---------+
+| year  | #Rows | #Files | Size | Format  |
++-------+-------+--------+------+---------+
+| 2000  | -1    | 0      | 0B   | TEXT    |
+| 2004  | -1    | 0      | 0B   | TEXT    |
+| 2008  | -1    | 0      | 0B   | TEXT    |
+| 2010  | -1    | 0      | 0B   | TEXT    |
+| 2011  | 4     | 1      | 22B  | TEXT    |
+| 2012  | 4     | 1      | 22B  | TEXT    |
+| 2013  | 1     | 1      | 231B | PARQUET |
+| Total | 9     | 3      | 275B |         |
++-------+-------+--------+------+---------+
+Returned 8 row(s) in 0.02s
+[localhost:21000] &gt; show column stats census;
++--------+----------+------------------+--------+----------+----------+
+| Column | Type     | #Distinct Values | #Nulls | Max Size | Avg Size |
++--------+----------+------------------+--------+----------+----------+
+| name   | STRING   | 4                | -1     | 5        | 4.5      |
+| year   | SMALLINT | 7                | -1     | 2        | 2        |
++--------+----------+------------------+--------+----------+----------+
+Returned 2 row(s) in 0.02s</code></pre>
+
+      <p class="p">
+        For examples showing how some queries work differently when statistics 
are available, see
+        <a class="xref" 
href="impala_perf_joins.html#perf_joins_examples">Examples of Join Order 
Optimization</a>. You can see how Impala executes a query
+        differently in each case by observing the <code class="ph 
codeph">EXPLAIN</code> output before and after collecting
+        statistics. Measure the before and after query times, and examine the 
throughput numbers in before and
+        after <code class="ph codeph">SUMMARY</code> or <code class="ph 
codeph">PROFILE</code> output, to verify how much the improved plan
+        speeds up performance.
+      </p>
+    </div>
+  </article>
+</article></main></body></html>
\ No newline at end of file


http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_perf_testing.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_perf_testing.html 
b/docs/build/html/topics/impala_perf_testing.html
new file mode 100644
index 0000000..0663ae9
--- /dev/null
+++ b/docs/build/html/topics/impala_perf_testing.html
@@ -0,0 +1,152 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; 
charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) 
Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta 
name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" 
content="../topics/impala_performance.html"><meta name="prodname" 
content="Impala"><meta name="prodname" content="Impala"><meta name="version" 
content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta 
name="DC.Format" content="XHTML"><meta name="DC.Identifier" 
content="performance_testing"><link rel="stylesheet" type="text/css" 
href="../commonltr.css"><title>Testing Impala Performance</title></head><body 
id="performance_testing"><main role="main"><article role="article" 
aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Testing Impala 
Performance</h1>
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      Test to ensure that Impala is configured for optimal performance. If you 
have installed Impala with cluster
+      management software, complete the processes described in this topic to 
help ensure a proper
+      configuration. These procedures can be used to verify that Impala is set 
up correctly.
+    </p>
+
+    <section class="section" 
id="performance_testing__checking_config_performance"><h2 class="title 
sectiontitle">Checking Impala Configuration Values</h2>
+
+      
+
+      <p class="p">
+        You can inspect Impala configuration values by connecting to your 
Impala server using a browser.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">To check Impala configuration values:</strong>
+      </p>
+
+      <ol class="ol">
+        <li class="li">
+          Use a browser to connect to one of the hosts running <code class="ph 
codeph">impalad</code> in your environment.
+          Connect using an address of the form
+          <code class="ph codeph">http://<var class="keyword 
varname">hostname</var>:<var class="keyword varname">port</var>/varz</code>.
+          <div class="note note note_note"><span class="note__title 
notetitle">Note:</span> 
+            In the preceding example, replace <code class="ph 
codeph">hostname</code> and <code class="ph codeph">port</code> with the name 
and
+            port of your Impala server. The default port is 25000.
+          </div>
+        </li>
+
+        <li class="li">
+          Review the configured values.
+          <p class="p">
+            For example, to check that your system is configured to use block 
locality tracking information, you
+            would check that the value for <code class="ph 
codeph">dfs.datanode.hdfs-blocks-metadata.enabled</code> is
+            <code class="ph codeph">true</code>.
+          </p>
+        </li>
+      </ol>
+
+      <p class="p" id="performance_testing__p_31">
+        <strong class="ph b">To check data locality:</strong>
+      </p>
+
+      <ol class="ol">
+        <li class="li">
+          Execute a query on a dataset that is available across multiple 
nodes. For example, for a table named
+          <code class="ph codeph">MyTable</code> that has a reasonable chance 
of being spread across multiple DataNodes:
+<pre class="pre codeblock"><code>[impalad-host:21000] &gt; SELECT COUNT (*) 
FROM MyTable</code></pre>
+        </li>
+
+        <li class="li">
+          After the query completes, review the contents of the Impala logs. 
You should find a recent message
+          similar to the following:
+<pre class="pre codeblock"><code>Total remote scan volume = 0</code></pre>
+        </li>
+      </ol>
+
+      <p class="p">
+        The presence of remote scans may indicate <code class="ph 
codeph">impalad</code> is not running on the correct nodes.
+        This can be because some DataNodes do not have <code class="ph 
codeph">impalad</code> running or it can be because the
+        <code class="ph codeph">impalad</code> instance that is starting the 
query is unable to contact one or more of the
+        <code class="ph codeph">impalad</code> instances.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">To understand the causes of this issue:</strong>
+      </p>
+
+      <ol class="ol">
+        <li class="li">
+          Connect to the debugging web server. By default, this server runs on 
port 25000. This page lists all
+          <code class="ph codeph">impalad</code> instances running in your 
cluster. If there are fewer instances than you expect,
+          this often indicates some DataNodes are not running <code class="ph 
codeph">impalad</code>. Ensure
+          <code class="ph codeph">impalad</code> is started on all DataNodes.
+        </li>
+
+        <li class="li">
+          
+          If you are using multi-homed hosts, ensure that the Impala daemon's 
hostname resolves to the interface on
+          which <code class="ph codeph">impalad</code> is running. The 
hostname Impala is using is displayed when
+          <code class="ph codeph">impalad</code> starts. To explicitly set the 
hostname, use the <code class="ph codeph">--hostname</code>&nbsp;flag.
+        </li>
+
+        <li class="li">
+          Check that <code class="ph codeph">statestored</code> is running as 
expected. Review the contents of the state store
+          log to ensure all instances of <code class="ph 
codeph">impalad</code> are listed as having connected to the state
+          store.
+        </li>
+      </ol>
+    </section>
+
+    <section class="section" 
id="performance_testing__checking_config_logs"><h2 class="title 
sectiontitle">Reviewing Impala Logs</h2>
+
+      
+
+      <p class="p">
+        You can review the contents of the Impala logs for signs that 
short-circuit reads or block location
+        tracking are not functioning. Before checking logs, execute a simple 
query against a small HDFS dataset.
+        Completing a query task generates log messages using current settings. 
Information on starting Impala and
+        executing queries can be found in <a class="xref" 
href="impala_processes.html#processes">Starting Impala</a> and
+        <a class="xref" href="impala_impala_shell.html#impala_shell">Using the 
Impala Shell (impala-shell Command)</a>. Information on logging can be found in
+        <a class="xref" href="impala_logging.html#logging">Using Impala 
Logging</a>. Log messages and their interpretations are as follows:
+      </p>
+
+      <table class="table"><caption></caption><colgroup><col 
style="width:75%"><col style="width:25%"></colgroup><thead class="thead">
+            <tr class="row">
+              <th class="entry nocellnorowborder" 
id="performance_testing__checking_config_logs__entry__1">
+                Log Message
+              </th>
+              <th class="entry nocellnorowborder" 
id="performance_testing__checking_config_logs__entry__2">
+                Interpretation
+              </th>
+            </tr>
+          </thead><tbody class="tbody">
+            <tr class="row">
+              <td class="entry nocellnorowborder" 
headers="performance_testing__checking_config_logs__entry__1 ">
+                <div class="p">
+<pre class="pre">Unknown disk id. This will negatively affect performance. 
Check your hdfs settings to enable block location metadata
+</pre>
+                </div>
+              </td>
+              <td class="entry nocellnorowborder" 
headers="performance_testing__checking_config_logs__entry__2 ">
+                <p class="p">
+                  Tracking block locality is not enabled.
+                </p>
+              </td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" 
headers="performance_testing__checking_config_logs__entry__1 ">
+                <div class="p">
+<pre class="pre">Unable to load native-hadoop library for your platform... 
using builtin-java classes where applicable</pre>
+                </div>
+              </td>
+              <td class="entry nocellnorowborder" 
headers="performance_testing__checking_config_logs__entry__2 ">
+                <p class="p">
+                  Native checksumming is not enabled.
+                </p>
+              </td>
+            </tr>
+          </tbody></table>
+    </section>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div 
class="parentlink"><strong>Parent topic:</strong> <a class="link" 
href="../topics/impala_performance.html">Tuning Impala for 
Performance</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_performance.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_performance.html 
b/docs/build/html/topics/impala_performance.html
new file mode 100644
index 0000000..13d44b3
--- /dev/null
+++ b/docs/build/html/topics/impala_performance.html
@@ -0,0 +1,116 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; 
charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) 
Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta 
name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" 
content="../topics/impala_perf_cookbook.html"><meta name="DC.Relation" 
scheme="URI" content="../topics/impala_perf_joins.html"><meta 
name="DC.Relation" scheme="URI" 
content="../topics/impala_perf_stats.html"><meta name="DC.Relation" 
scheme="URI" content="../topics/impala_perf_benchmarking.html"><meta 
name="DC.Relation" scheme="URI" 
content="../topics/impala_perf_resources.html"><meta name="DC.Relation" 
scheme="URI" content="../topics/impala_runtime_filtering.html"><meta 
name="DC.Relation" scheme="URI" 
content="../topics/impala_perf_hdfs_caching.html"><meta name="DC.Relation" 
scheme="URI" content="../topics/impala_perf_testing.html"><meta 
name="DC.Relation" scheme="URI" content="../topics/im
 pala_explain_plan.html"><meta name="DC.Relation" scheme="URI" 
content="../topics/impala_perf_skew.html"><meta name="prodname" 
content="Impala"><meta name="prodname" content="Impala"><meta name="version" 
content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta 
name="DC.Format" content="XHTML"><meta name="DC.Identifier" 
content="performance"><link rel="stylesheet" type="text/css" 
href="../commonltr.css"><title>Tuning Impala for 
Performance</title></head><body id="performance"><main role="main"><article 
role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Tuning Impala for 
Performance</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      The following sections explain the factors affecting the performance of 
Impala features, and procedures for
+      tuning, monitoring, and benchmarking Impala queries and other SQL 
operations.
+    </p>
+
+    <p class="p">
+      This section also describes techniques for maximizing Impala 
scalability. Scalability is tied to performance:
+      it means that performance remains high as the system workload increases. 
For example, reducing the disk I/O
+      performed by a query can speed up an individual query, and at the same 
time improve scalability by making it
+      practical to run more queries simultaneously. Sometimes, an optimization 
technique improves scalability more
+      than performance. For example, reducing memory usage for a query might 
not change the query performance much,
+      but might improve scalability by allowing more Impala queries or other 
kinds of jobs to run at the same time
+      without running out of memory.
+    </p>
+
+    <div class="note note note_note"><span class="note__title 
notetitle">Note:</span> 
+      <p class="p">
+        Before starting any performance tuning or benchmarking, make sure your 
system is configured with all the
+        recommended minimum hardware requirements from <a class="xref" 
href="impala_prereqs.html#prereqs_hardware">Hardware Requirements</a> and
+        software settings from <a class="xref" 
href="impala_config_performance.html#config_performance">Post-Installation 
Configuration for Impala</a>.
+      </p>
+    </div>
+
+    <ul class="ul">
+      <li class="li">
+        <a class="xref" 
href="impala_partitioning.html#partitioning">Partitioning for Impala 
Tables</a>. This technique physically divides the data based on
+        the different values in frequently queried columns, allowing queries 
to skip reading a large percentage of
+        the data in a table.
+      </li>
+
+      <li class="li">
+        <a class="xref" href="impala_perf_joins.html#perf_joins">Performance 
Considerations for Join Queries</a>. Joins are the main class of queries that 
you can tune at
+        the SQL level, as opposed to changing physical factors such as the 
file format or the hardware
+        configuration. The related topics <a class="xref" 
href="impala_perf_stats.html#perf_column_stats">Overview of Column 
Statistics</a> and
+        <a class="xref" 
href="impala_perf_stats.html#perf_table_stats">Overview of Table Statistics</a> 
are also important primarily for join performance.
+      </li>
+
+      <li class="li">
+        <a class="xref" 
href="impala_perf_stats.html#perf_table_stats">Overview of Table Statistics</a> 
and
+        <a class="xref" 
href="impala_perf_stats.html#perf_column_stats">Overview of Column 
Statistics</a>. Gathering table and column statistics, using the
+        <code class="ph codeph">COMPUTE STATS</code> statement, helps Impala 
automatically optimize the performance for join
+        queries, without requiring changes to SQL query statements. (This 
process is greatly simplified in Impala
+        1.2.2 and higher, because the <code class="ph codeph">COMPUTE 
STATS</code> statement gathers both kinds of statistics in
+        one operation, and does not require any setup and configuration as was 
previously necessary for the
+        <code class="ph codeph">ANALYZE TABLE</code> statement in Hive.)
+      </li>
+
+      <li class="li">
+        <a class="xref" 
href="impala_perf_testing.html#performance_testing">Testing Impala 
Performance</a>. Do some post-setup testing to ensure Impala is
+        using optimal settings for performance, before conducting any 
benchmark tests.
+      </li>
+
+      <li class="li">
+        <a class="xref" 
href="impala_perf_benchmarking.html#perf_benchmarks">Benchmarking Impala 
Queries</a>. The configuration and sample data that you use
+        for initial experiments with Impala is often not appropriate for doing 
performance tests.
+      </li>
+
+      <li class="li">
+        <a class="xref" 
href="impala_perf_resources.html#mem_limits">Controlling Impala Resource 
Usage</a>. The more memory Impala can utilize, the better query
+        performance you can expect. In a cluster running other kinds of 
workloads as well, you must make tradeoffs
+        to make sure all Hadoop components have enough memory to perform well, 
so you might cap the memory that
+        Impala can use.
+      </li>
+
+      
+
+      <li class="li">
+        <a class="xref" href="impala_s3.html#s3">Using Impala with the Amazon 
S3 Filesystem</a>. Queries against data stored in the Amazon Simple Storage 
Service (S3)
+        have different performance characteristics than when the data is 
stored in HDFS.
+      </li>
+    </ul>
+
+    <p class="p toc"></p>
+
+    <p class="p">
+        A good source of tips related to scalability and performance tuning is 
the
+        <a class="xref" 
href="http://www.slideshare.net/cloudera/the-impala-cookbook-42530186"; 
target="_blank">Impala Cookbook</a>
+        presentation. These slides are updated periodically as new features 
come out and new benchmarks are performed.
+      </p>
+
+  </div>
+
+
+
+  
+
+  
+
+  
+
+  
+
+  
+
+  
+
+  
+
+  
+
+  
+
+  
+
+  
+<nav role="navigation" class="related-links"><ul class="ullinks"><li 
class="link ulchildlink"><strong><a 
href="../topics/impala_perf_cookbook.html">Impala Performance Guidelines and 
Best Practices</a></strong><br></li><li class="link ulchildlink"><strong><a 
href="../topics/impala_perf_joins.html">Performance Considerations for Join 
Queries</a></strong><br></li><li class="link ulchildlink"><strong><a 
href="../topics/impala_perf_stats.html">Table and Column 
Statistics</a></strong><br></li><li class="link ulchildlink"><strong><a 
href="../topics/impala_perf_benchmarking.html">Benchmarking Impala 
Queries</a></strong><br></li><li class="link ulchildlink"><strong><a 
href="../topics/impala_perf_resources.html">Controlling Impala Resource 
Usage</a></strong><br></li><li class="link ulchildlink"><strong><a 
href="../topics/impala_runtime_filtering.html">Runtime Filtering for Impala 
Queries (Impala 2.5 or higher only)</a></strong><br></li><li class="link 
ulchildlink"><strong><a href="../topics/i
 mpala_perf_hdfs_caching.html">Using HDFS Caching with Impala (Impala 2.1 or 
higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a 
href="../topics/impala_perf_testing.html">Testing Impala 
Performance</a></strong><br></li><li class="link ulchildlink"><strong><a 
href="../topics/impala_explain_plan.html">Understanding Impala Query 
Performance - EXPLAIN Plans and Query Profiles</a></strong><br></li><li 
class="link ulchildlink"><strong><a 
href="../topics/impala_perf_skew.html">Detecting and Correcting HDFS Block Skew 
Conditions</a></strong><br></li></ul></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_planning.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_planning.html 
b/docs/build/html/topics/impala_planning.html
new file mode 100644
index 0000000..8707957
--- /dev/null
+++ b/docs/build/html/topics/impala_planning.html
@@ -0,0 +1,20 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; 
charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) 
Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta 
name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" 
content="../topics/impala_prereqs.html#prereqs"><meta name="DC.Relation" 
scheme="URI" content="../topics/impala_cluster_sizing.html"><meta 
name="DC.Relation" scheme="URI" 
content="../topics/impala_schema_design.html"><meta name="prodname" 
content="Impala"><meta name="prodname" content="Impala"><meta name="version" 
content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta 
name="DC.Format" content="XHTML"><meta name="DC.Identifier" 
content="planning"><link rel="stylesheet" type="text/css" 
href="../commonltr.css"><title>Planning for Impala 
Deployment</title></head><body id="planning"><main role="main"><article 
role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Planning for Impala 
Deployment</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      Before you set up Impala in production, do some planning to make sure 
that your hardware setup has sufficient
+      capacity, that your cluster topology is optimal for Impala queries, and 
that your schema design and ETL
+      processes follow the best practices for Impala.
+    </p>
+
+    <p class="p toc"></p>
+  </div>
+<nav role="navigation" class="related-links"><ul class="ullinks"><li 
class="link ulchildlink"><strong><a 
href="../topics/impala_prereqs.html#prereqs">Impala 
Requirements</a></strong><br></li><li class="link ulchildlink"><strong><a 
href="../topics/impala_cluster_sizing.html">Cluster Sizing Guidelines for 
Impala</a></strong><br></li><li class="link ulchildlink"><strong><a 
href="../topics/impala_schema_design.html">Guidelines for Designing Impala 
Schemas</a></strong><br></li></ul></nav></article></main></body></html>
\ No newline at end of file

[14/51] [partial] incubator-impala git commit: IMPALA-4181 [DOCS] Publish rendered Impala documentation to ASF site

Reply via email to