[39/51] [partial] incubator-impala git commit: IMPALA-3398: Add docs to main Impala branch.

jbapple Thu, 17 Nov 2016 15:12:48 -0800

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3be0f122/docs/topics/impala_components.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_components.xml 
b/docs/topics/impala_components.xml
new file mode 100644
index 0000000..44e5c34
--- /dev/null
+++ b/docs/topics/impala_components.xml
@@ -0,0 +1,180 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="intro_components">
+
+  <title>Components of the Impala Server</title>
+  <titlealts audience="PDF"><navtitle>Components</navtitle></titlealts>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="Concepts"/>
+      <data name="Category" value="Administrators"/>
+      <data name="Category" value="Developers"/>
+      <data name="Category" value="Data Analysts"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      The Impala server is a distributed, massively parallel processing (MPP) 
database engine. It consists of
+      different daemon processes that run on specific hosts within your CDH 
cluster.
+    </p>
+
+    <p outputclass="toc inpage"/>
+  </conbody>
+
+  <concept id="intro_impalad">
+
+    <title>The Impala Daemon</title>
+
+    <conbody>
+
+      <p>
+        The core Impala component is a daemon process that runs on each 
DataNode of the cluster, physically represented
+        by the <codeph>impalad</codeph> process. It reads and writes to data 
files; accepts queries transmitted
+        from the <codeph>impala-shell</codeph> command, Hue, JDBC, or ODBC; 
parallelizes the queries and
+        distributes work across the cluster; and transmits intermediate query 
results back to the
+        central coordinator node.
+      </p>
+
+      <p>
+        You can submit a query to the Impala daemon running on any DataNode, 
and that instance of the daemon serves as the
+        <term>coordinator node</term> for that query. The other nodes transmit 
partial results back to the
+        coordinator, which constructs the final result set for a query. When 
running experiments with functionality
+        through the <codeph>impala-shell</codeph> command, you might always 
connect to the same Impala daemon for
+        convenience. For clusters running production workloads, you might 
load-balance by
+        submitting each query to a different Impala daemon in round-robin 
style, using the JDBC or ODBC interfaces.
+      </p>
+
+      <p>
+        The Impala daemons are in constant communication with the 
<term>statestore</term>, to confirm which nodes
+        are healthy and can accept new work.
+      </p>
+
+      <p rev="1.2">
+        They also receive broadcast messages from the 
<cmdname>catalogd</cmdname> daemon (introduced in Impala 1.2)
+        whenever any Impala node in the cluster creates, alters, or drops any 
type of object, or when an
+        <codeph>INSERT</codeph> or <codeph>LOAD DATA</codeph> statement is 
processed through Impala. This
+        background communication minimizes the need for 
<codeph>REFRESH</codeph> or <codeph>INVALIDATE
+        METADATA</codeph> statements that were needed to coordinate metadata 
across nodes prior to Impala 1.2.
+      </p>
+
+      <p>
+        <b>Related information:</b> <xref 
href="impala_config_options.xml#config_options"/>,
+        <xref href="impala_processes.xml#processes"/>, <xref 
href="impala_timeouts.xml#impalad_timeout"/>,
+        <xref href="impala_ports.xml#ports"/>, <xref 
href="impala_proxy.xml#proxy"/>
+      </p>
+    </conbody>
+  </concept>
+
+  <concept id="intro_statestore">
+
+    <title>The Impala Statestore</title>
+
+    <conbody>
+
+      <p>
+        The Impala component known as the <term>statestore</term> checks on 
the health of Impala daemons on all the
+        DataNodes in a cluster, and continuously relays its findings to each 
of those daemons. It is physically
+        represented by a daemon process named <codeph>statestored</codeph>; 
you only need such a process on one
+        host in the cluster. If an Impala daemon goes offline due to hardware 
failure, network error, software issue,
+        or other reason, the statestore informs all the other Impala daemons 
so that future queries can avoid making
+        requests to the unreachable node.
+      </p>
+
+      <p>
+        Because the statestore's purpose is to help when things go wrong, it 
is not critical to the normal
+        operation of an Impala cluster. If the statestore is not running or 
becomes unreachable, the Impala daemons
+        continue running and distributing work among themselves as usual; the 
cluster just becomes less robust if
+        other Impala daemons fail while the statestore is offline. When the 
statestore comes back online, it re-establishes
+        communication with the Impala daemons and resumes its monitoring 
function.
+      </p>
+
+      <p 
conref="../shared/impala_common.xml#common/statestored_catalogd_ha_blurb"/>
+
+      <p>
+        <b>Related information:</b>
+      </p>
+
+      <p>
+        <xref href="impala_scalability.xml#statestore_scalability"/>,
+        <xref href="impala_config_options.xml#config_options"/>, <xref 
href="impala_processes.xml#processes"/>,
+        <xref href="impala_timeouts.xml#statestore_timeout"/>, <xref 
href="impala_ports.xml#ports"/>
+      </p>
+    </conbody>
+  </concept>
+
+  <concept rev="1.2" id="intro_catalogd">
+
+    <title>The Impala Catalog Service</title>
+
+    <conbody>
+
+      <p>
+        The Impala component known as the <term>catalog service</term> relays 
the metadata changes from Impala SQL
+        statements to all the DataNodes in a cluster. It is physically 
represented by a daemon process named
+        <codeph>catalogd</codeph>; you only need such a process on one host in 
the cluster. Because the requests
+        are passed through the statestore daemon, it makes sense to run the 
<cmdname>statestored</cmdname> and
+        <cmdname>catalogd</cmdname> services on the same host.
+      </p>
+
+      <p>
+        The catalog service avoids the need to issue
+        <codeph>REFRESH</codeph> and <codeph>INVALIDATE METADATA</codeph> 
statements when the metadata changes are
+        performed by statements issued through Impala. When you create a 
table, load data, and so on through Hive,
+        you do need to issue <codeph>REFRESH</codeph> or <codeph>INVALIDATE 
METADATA</codeph> on an Impala node
+        before executing a query there.
+      </p>
+
+      <p>
+        This feature touches a number of aspects of Impala:
+      </p>
+
+<!-- This was formerly a conref, but since the list of links also included a 
link
+     to this same topic, materializing the list here and removing that
+     circular link. (The conref is still used in Incompatible Changes.)
+
+      <ul conref="../shared/impala_common.xml#common/catalogd_xrefs">
+        <li/>
+      </ul>
+-->
+
+      <ul id="catalogd_xrefs">
+        <li>
+          <p>
+            See <xref href="impala_install.xml#install"/>, <xref 
href="impala_upgrading.xml#upgrading"/> and
+            <xref href="impala_processes.xml#processes"/>, for usage 
information for the
+            <cmdname>catalogd</cmdname> daemon.
+          </p>
+        </li>
+
+        <li>
+          <p>
+            The <codeph>REFRESH</codeph> and <codeph>INVALIDATE 
METADATA</codeph> statements are not needed
+            when the <codeph>CREATE TABLE</codeph>, <codeph>INSERT</codeph>, 
or other table-changing or
+            data-changing operation is performed through Impala. These 
statements are still needed if such
+            operations are done through Hive or by manipulating data files 
directly in HDFS, but in those cases the
+            statements only need to be issued on one Impala node rather than 
on all nodes. See
+            <xref href="impala_refresh.xml#refresh"/> and
+            <xref href="impala_invalidate_metadata.xml#invalidate_metadata"/> 
for the latest usage information for
+            those statements.
+          </p>
+        </li>
+      </ul>
+
+      <p 
conref="../shared/impala_common.xml#common/load_catalog_in_background"/>
+
+      <p 
conref="../shared/impala_common.xml#common/statestored_catalogd_ha_blurb"/>
+
+      <note>
+        <p conref="../shared/impala_common.xml#common/catalog_server_124"/>
+      </note>
+
+      <p>
+        <b>Related information:</b> <xref 
href="impala_config_options.xml#config_options"/>,
+        <xref href="impala_processes.xml#processes"/>, <xref 
href="impala_ports.xml#ports"/>
+      </p>
+    </conbody>
+  </concept>
+</concept>


http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3be0f122/docs/topics/impala_compression_codec.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_compression_codec.xml 
b/docs/topics/impala_compression_codec.xml
new file mode 100644
index 0000000..739c651
--- /dev/null
+++ b/docs/topics/impala_compression_codec.xml
@@ -0,0 +1,98 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept rev="2.0.0" id="compression_codec">
+
+  <title>COMPRESSION_CODEC Query Option (<keyword keyref="impala20"/> or 
higher only)</title>
+  <titlealts audience="PDF"><navtitle>COMPRESSION_CODEC</navtitle></titlealts>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="Impala Query Options"/>
+      <data name="Category" value="Compression"/>
+      <data name="Category" value="File Formats"/>
+      <data name="Category" value="Parquet"/>
+      <data name="Category" value="Snappy"/>
+      <data name="Category" value="Gzip"/>
+      <data name="Category" value="Developers"/>
+      <data name="Category" value="Data Analysts"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+<!-- The initial part of this paragraph is copied straight from the 
#parquet_compression topic. -->
+
+<!-- Could turn into a conref. -->
+
+    <p rev="2.0.0">
+      <indexterm audience="Cloudera">COMPRESSION_CODEC query option</indexterm>
+      When Impala writes Parquet data files using the <codeph>INSERT</codeph> 
statement, the underlying compression
+      is controlled by the <codeph>COMPRESSION_CODEC</codeph> query option.
+    </p>
+
+    <note>
+      Prior to Impala 2.0, this option was named 
<codeph>PARQUET_COMPRESSION_CODEC</codeph>. In Impala 2.0 and
+      later, the <codeph>PARQUET_COMPRESSION_CODEC</codeph> name is not 
recognized. Use the more general name
+      <codeph>COMPRESSION_CODEC</codeph> for new code.
+    </note>
+
+    <p conref="../shared/impala_common.xml#common/syntax_blurb"/>
+
+<codeblock>SET COMPRESSION_CODEC=<varname>codec_name</varname>;</codeblock>
+
+    <p>
+      The allowed values for this query option are <codeph>SNAPPY</codeph> 
(the default), <codeph>GZIP</codeph>,
+      and <codeph>NONE</codeph>.
+    </p>
+
+    <note>
+      A Parquet file created with <codeph>COMPRESSION_CODEC=NONE</codeph> is 
still typically smaller than the
+      original data, due to encoding schemes such as run-length encoding and 
dictionary encoding that are applied
+      separately from compression.
+    </note>
+
+    <p></p>
+
+    <p>
+      The option value is not case-sensitive.
+    </p>
+
+    <p>
+      If the option is set to an unrecognized value, all kinds of queries will 
fail due to the invalid option
+      setting, not just queries involving Parquet tables. (The value 
<codeph>BZIP2</codeph> is also recognized, but
+      is not compatible with Parquet tables.)
+    </p>
+
+    <p>
+      <b>Type:</b> <codeph>STRING</codeph>
+    </p>
+
+    <p>
+      <b>Default:</b> <codeph>SNAPPY</codeph>
+    </p>
+
+
+    <p conref="../shared/impala_common.xml#common/example_blurb"/>
+
+<codeblock>set compression_codec=gzip;
+insert into parquet_table_highly_compressed select * from t1;
+
+set compression_codec=snappy;
+insert into parquet_table_compression_plus_fast_queries select * from t1;
+
+set compression_codec=none;
+insert into parquet_table_no_compression select * from t1;
+
+set compression_codec=foo;
+select * from t1 limit 5;
+ERROR: Invalid compression codec: foo
+</codeblock>
+
+    <p conref="../shared/impala_common.xml#common/related_info"/>
+
+    <p>
+      For information about how compressing Parquet data files affects query 
performance, see
+      <xref href="impala_parquet.xml#parquet_compression"/>.
+    </p>
+  </conbody>
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3be0f122/docs/topics/impala_compute_stats.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_compute_stats.xml 
b/docs/topics/impala_compute_stats.xml
new file mode 100644
index 0000000..b915b77
--- /dev/null
+++ b/docs/topics/impala_compute_stats.xml
@@ -0,0 +1,432 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept rev="1.2.2" id="compute_stats">
+
+  <title>COMPUTE STATS Statement</title>
+  <titlealts audience="PDF"><navtitle>COMPUTE STATS</navtitle></titlealts>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="Performance"/>
+      <data name="Category" value="Scalability"/>
+      <data name="Category" value="ETL"/>
+      <data name="Category" value="Ingest"/>
+      <data name="Category" value="SQL"/>
+      <data name="Category" value="Tables"/>
+      <data name="Category" value="Developers"/>
+      <data name="Category" value="Data Analysts"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      <indexterm audience="Cloudera">COMPUTE STATS statement</indexterm>
+      Gathers information about volume and distribution of data in a table and 
all associated columns and
+      partitions. The information is stored in the metastore database, and 
used by Impala to help optimize queries.
+      For example, if Impala can determine that a table is large or small, or 
has many or few distinct values it
+      can organize parallelize the work appropriately for a join query or 
insert operation. For details about the
+      kinds of information gathered by this statement, see <xref 
href="impala_perf_stats.xml#perf_stats"/>.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/syntax_blurb"/>
+
+<codeblock rev="2.1.0">COMPUTE STATS 
[<varname>db_name</varname>.]<varname>table_name</varname>
+COMPUTE INCREMENTAL STATS 
[<varname>db_name</varname>.]<varname>table_name</varname> [PARTITION 
(<varname>partition_spec</varname>)]
+
+<varname>partition_spec</varname> ::= 
<varname>partition_col</varname>=<varname>constant_value</varname>
+</codeblock>
+
+    <p conref="../shared/impala_common.xml#common/incremental_partition_spec"/>
+
+    <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/>
+
+    <p>
+      Originally, Impala relied on users to run the Hive <codeph>ANALYZE 
TABLE</codeph> statement, but that method
+      of gathering statistics proved unreliable and difficult to use. The 
Impala <codeph>COMPUTE STATS</codeph>
+      statement is built from the ground up to improve the reliability and 
user-friendliness of this operation.
+      <codeph>COMPUTE STATS</codeph> does not require any setup steps or 
special configuration. You only run a
+      single Impala <codeph>COMPUTE STATS</codeph> statement to gather both 
table and column statistics, rather
+      than separate Hive <codeph>ANALYZE TABLE</codeph> statements for each 
kind of statistics.
+    </p>
+
+    <p rev="2.1.0">
+      The <codeph>COMPUTE INCREMENTAL STATS</codeph> variation is a shortcut 
for partitioned tables that works on a
+      subset of partitions rather than the entire table. The incremental 
nature makes it suitable for large tables
+      with many partitions, where a full <codeph>COMPUTE STATS</codeph> 
operation takes too long to be practical
+      each time a partition is added or dropped. See <xref 
href="impala_perf_stats.xml#perf_stats_incremental"/>
+      for full usage details.
+    </p>
+
+    <p>
+      <codeph>COMPUTE INCREMENTAL STATS</codeph> only applies to partitioned 
tables. If you use the
+      <codeph>INCREMENTAL</codeph> clause for an unpartitioned table, Impala 
automatically uses the original
+      <codeph>COMPUTE STATS</codeph> statement. Such tables display 
<codeph>false</codeph> under the
+      <codeph>Incremental stats</codeph> column of the <codeph>SHOW TABLE 
STATS</codeph> output.
+    </p>
+
+    <note>
+      Because many of the most performance-critical and resource-intensive 
operations rely on table and column
+      statistics to construct accurate and efficient plans, <codeph>COMPUTE 
STATS</codeph> is an important step at
+      the end of your ETL process. Run <codeph>COMPUTE STATS</codeph> on all 
tables as your first step during
+      performance tuning for slow queries, or troubleshooting for 
out-of-memory conditions:
+      <ul>
+        <li>
+          Accurate statistics help Impala construct an efficient query plan 
for join queries, improving performance
+          and reducing memory usage.
+        </li>
+
+        <li>
+          Accurate statistics help Impala distribute the work effectively for 
insert operations into Parquet
+          tables, improving performance and reducing memory usage.
+        </li>
+
+        <li rev="1.3.0">
+          Accurate statistics help Impala estimate the memory required for 
each query, which is important when you
+          use resource management features, such as admission control and the 
YARN resource management framework.
+          The statistics help Impala to achieve high concurrency, full 
utilization of available memory, and avoid
+          contention with workloads from other Hadoop components.
+        </li>
+      </ul>
+    </note>
+
+    <p conref="../shared/impala_common.xml#common/complex_types_blurb"/>
+
+    <p rev="2.3.0">
+      Currently, the statistics created by the <codeph>COMPUTE STATS</codeph> 
statement do not include
+      information about complex type columns. The column stats metrics for 
complex columns are always shown
+      as -1. For queries involving complex type columns, Impala uses
+      heuristics to estimate the data distribution within such columns.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/hbase_blurb"/>
+
+    <p>
+      <codeph>COMPUTE STATS</codeph> works for HBase tables also. The 
statistics gathered for HBase tables are
+      somewhat different than for HDFS-backed tables, but that metadata is 
still used for optimization when HBase
+      tables are involved in join queries.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/s3_blurb"/>
+
+    <p rev="2.2.0">
+      <codeph>COMPUTE STATS</codeph> also works for tables where data resides 
in the Amazon Simple Storage Service (S3).
+      See <xref href="impala_s3.xml#s3"/> for details.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/performance_blurb"/>
+
+    <p>
+      The statistics collected by <codeph>COMPUTE STATS</codeph> are used to 
optimize join queries
+      <codeph>INSERT</codeph> operations into Parquet tables, and other 
resource-intensive kinds of SQL statements.
+      See <xref href="impala_perf_stats.xml#perf_stats"/> for details.
+    </p>
+
+    <p>
+      For large tables, the <codeph>COMPUTE STATS</codeph> statement itself 
might take a long time and you
+      might need to tune its performance. The <codeph>COMPUTE STATS</codeph> 
statement does not work with the
+      <codeph>EXPLAIN</codeph> statement, or the <codeph>SUMMARY</codeph> 
command in <cmdname>impala-shell</cmdname>.
+      You can use the <codeph>PROFILE</codeph> statement in 
<cmdname>impala-shell</cmdname> to examine timing information
+      for the statement as a whole. If a basic <codeph>COMPUTE STATS</codeph> 
statement takes a long time for a
+      partitioned table, consider switching to the <codeph>COMPUTE INCREMENTAL 
STATS</codeph> syntax so that only
+      newly added partitions are analyzed each time.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/example_blurb"/>
+
+    <p>
+      This example shows two tables, <codeph>T1</codeph> and 
<codeph>T2</codeph>, with a small number distinct
+      values linked by a parent-child relationship between 
<codeph>T1.ID</codeph> and <codeph>T2.PARENT</codeph>.
+      <codeph>T1</codeph> is tiny, while <codeph>T2</codeph> has approximately 
100K rows. Initially, the statistics
+      includes physical measurements such as the number of files, the total 
size, and size measurements for
+      fixed-length columns such as with the <codeph>INT</codeph> type. Unknown 
values are represented by -1. After
+      running <codeph>COMPUTE STATS</codeph> for each table, much more 
information is available through the
+      <codeph>SHOW STATS</codeph> statements. If you were running a join query 
involving both of these tables, you
+      would need statistics for both tables to get the most effective 
optimization for the query.
+    </p>
+
+<!-- Note: chopped off any excess characters at position 87 and after,
+           to avoid weird wrapping in PDF.
+           Applies to any subsequent examples with output from SHOW ... STATS 
too. -->
+
+<codeblock>[localhost:21000] &gt; show table stats t1;
+Query: show table stats t1
++-------+--------+------+--------+
+| #Rows | #Files | Size | Format |
++-------+--------+------+--------+
+| -1    | 1      | 33B  | TEXT   |
++-------+--------+------+--------+
+Returned 1 row(s) in 0.02s
+[localhost:21000] &gt; show table stats t2;
+Query: show table stats t2
++-------+--------+----------+--------+
+| #Rows | #Files | Size     | Format |
++-------+--------+----------+--------+
+| -1    | 28     | 960.00KB | TEXT   |
++-------+--------+----------+--------+
+Returned 1 row(s) in 0.01s
+[localhost:21000] &gt; show column stats t1;
+Query: show column stats t1
++--------+--------+------------------+--------+----------+----------+
+| Column | Type   | #Distinct Values | #Nulls | Max Size | Avg Size |
++--------+--------+------------------+--------+----------+----------+
+| id     | INT    | -1               | -1     | 4        | 4        |
+| s      | STRING | -1               | -1     | -1       | -1       |
++--------+--------+------------------+--------+----------+----------+
+Returned 2 row(s) in 1.71s
+[localhost:21000] &gt; show column stats t2;
+Query: show column stats t2
++--------+--------+------------------+--------+----------+----------+
+| Column | Type   | #Distinct Values | #Nulls | Max Size | Avg Size |
++--------+--------+------------------+--------+----------+----------+
+| parent | INT    | -1               | -1     | 4        | 4        |
+| s      | STRING | -1               | -1     | -1       | -1       |
++--------+--------+------------------+--------+----------+----------+
+Returned 2 row(s) in 0.01s
+[localhost:21000] &gt; compute stats t1;
+Query: compute stats t1
++-----------------------------------------+
+| summary                                 |
++-----------------------------------------+
+| Updated 1 partition(s) and 2 column(s). |
++-----------------------------------------+
+Returned 1 row(s) in 5.30s
+[localhost:21000] &gt; show table stats t1;
+Query: show table stats t1
++-------+--------+------+--------+
+| #Rows | #Files | Size | Format |
++-------+--------+------+--------+
+| 3     | 1      | 33B  | TEXT   |
++-------+--------+------+--------+
+Returned 1 row(s) in 0.01s
+[localhost:21000] &gt; show column stats t1;
+Query: show column stats t1
++--------+--------+------------------+--------+----------+----------+
+| Column | Type   | #Distinct Values | #Nulls | Max Size | Avg Size |
++--------+--------+------------------+--------+----------+----------+
+| id     | INT    | 3                | -1     | 4        | 4        |
+| s      | STRING | 3                | -1     | -1       | -1       |
++--------+--------+------------------+--------+----------+----------+
+Returned 2 row(s) in 0.02s
+[localhost:21000] &gt; compute stats t2;
+Query: compute stats t2
++-----------------------------------------+
+| summary                                 |
++-----------------------------------------+
+| Updated 1 partition(s) and 2 column(s). |
++-----------------------------------------+
+Returned 1 row(s) in 5.70s
+[localhost:21000] &gt; show table stats t2;
+Query: show table stats t2
++-------+--------+----------+--------+
+| #Rows | #Files | Size     | Format |
++-------+--------+----------+--------+
+| 98304 | 1      | 960.00KB | TEXT   |
++-------+--------+----------+--------+
+Returned 1 row(s) in 0.03s
+[localhost:21000] &gt; show column stats t2;
+Query: show column stats t2
++--------+--------+------------------+--------+----------+----------+
+| Column | Type   | #Distinct Values | #Nulls | Max Size | Avg Size |
++--------+--------+------------------+--------+----------+----------+
+| parent | INT    | 3                | -1     | 4        | 4        |
+| s      | STRING | 6                | -1     | 14       | 9.3      |
++--------+--------+------------------+--------+----------+----------+
+Returned 2 row(s) in 0.01s</codeblock>
+
+    <p rev="2.1.0">
+      The following example shows how to use the <codeph>INCREMENTAL</codeph> 
clause, available in Impala 2.1.0 and
+      higher. The <codeph>COMPUTE INCREMENTAL STATS</codeph> syntax lets you 
collect statistics for newly added or
+      changed partitions, without rescanning the entire table.
+    </p>
+
+<codeblock>-- Initially the table has no incremental stats, as indicated
+-- by -1 under #Rows and false under Incremental stats.
+show table stats item_partitioned;
++-------------+-------+--------+----------+--------------+---------+------------------
+| i_category  | #Rows | #Files | Size     | Bytes Cached | Format  | 
Incremental stats
++-------------+-------+--------+----------+--------------+---------+------------------
+| Books       | -1    | 1      | 223.74KB | NOT CACHED   | PARQUET | false
+| Children    | -1    | 1      | 230.05KB | NOT CACHED   | PARQUET | false
+| Electronics | -1    | 1      | 232.67KB | NOT CACHED   | PARQUET | false
+| Home        | -1    | 1      | 232.56KB | NOT CACHED   | PARQUET | false
+| Jewelry     | -1    | 1      | 223.72KB | NOT CACHED   | PARQUET | false
+| Men         | -1    | 1      | 231.25KB | NOT CACHED   | PARQUET | false
+| Music       | -1    | 1      | 237.90KB | NOT CACHED   | PARQUET | false
+| Shoes       | -1    | 1      | 234.90KB | NOT CACHED   | PARQUET | false
+| Sports      | -1    | 1      | 227.97KB | NOT CACHED   | PARQUET | false
+| Women       | -1    | 1      | 226.27KB | NOT CACHED   | PARQUET | false
+| Total       | -1    | 10     | 2.25MB   | 0B           |         |
++-------------+-------+--------+----------+--------------+---------+------------------
+
+-- After the first COMPUTE INCREMENTAL STATS,
+-- all partitions have stats.
+compute incremental stats item_partitioned;
++-------------------------------------------+
+| summary                                   |
++-------------------------------------------+
+| Updated 10 partition(s) and 21 column(s). |
++-------------------------------------------+
+show table stats item_partitioned;
++-------------+-------+--------+----------+--------------+---------+------------------
+| i_category  | #Rows | #Files | Size     | Bytes Cached | Format  | 
Incremental stats
++-------------+-------+--------+----------+--------------+---------+------------------
+| Books       | 1733  | 1      | 223.74KB | NOT CACHED   | PARQUET | true
+| Children    | 1786  | 1      | 230.05KB | NOT CACHED   | PARQUET | true
+| Electronics | 1812  | 1      | 232.67KB | NOT CACHED   | PARQUET | true
+| Home        | 1807  | 1      | 232.56KB | NOT CACHED   | PARQUET | true
+| Jewelry     | 1740  | 1      | 223.72KB | NOT CACHED   | PARQUET | true
+| Men         | 1811  | 1      | 231.25KB | NOT CACHED   | PARQUET | true
+| Music       | 1860  | 1      | 237.90KB | NOT CACHED   | PARQUET | true
+| Shoes       | 1835  | 1      | 234.90KB | NOT CACHED   | PARQUET | true
+| Sports      | 1783  | 1      | 227.97KB | NOT CACHED   | PARQUET | true
+| Women       | 1790  | 1      | 226.27KB | NOT CACHED   | PARQUET | true
+| Total       | 17957 | 10     | 2.25MB   | 0B           |         |
++-------------+-------+--------+----------+--------------+---------+------------------
+
+-- Add a new partition...
+alter table item_partitioned add partition (i_category='Camping');
+-- Add or replace files in HDFS outside of Impala,
+-- rendering the stats for a partition obsolete.
+!import_data_into_sports_partition.sh
+refresh item_partitioned;
+drop incremental stats item_partitioned partition (i_category='Sports');
+-- Now some partitions have incremental stats
+-- and some do not.
+show table stats item_partitioned;
++-------------+-------+--------+----------+--------------+---------+------------------
+| i_category  | #Rows | #Files | Size     | Bytes Cached | Format  | 
Incremental stats
++-------------+-------+--------+----------+--------------+---------+------------------
+| Books       | 1733  | 1      | 223.74KB | NOT CACHED   | PARQUET | true
+| Camping     | -1    | 1      | 408.02KB | NOT CACHED   | PARQUET | false
+| Children    | 1786  | 1      | 230.05KB | NOT CACHED   | PARQUET | true
+| Electronics | 1812  | 1      | 232.67KB | NOT CACHED   | PARQUET | true
+| Home        | 1807  | 1      | 232.56KB | NOT CACHED   | PARQUET | true
+| Jewelry     | 1740  | 1      | 223.72KB | NOT CACHED   | PARQUET | true
+| Men         | 1811  | 1      | 231.25KB | NOT CACHED   | PARQUET | true
+| Music       | 1860  | 1      | 237.90KB | NOT CACHED   | PARQUET | true
+| Shoes       | 1835  | 1      | 234.90KB | NOT CACHED   | PARQUET | true
+| Sports      | -1    | 1      | 227.97KB | NOT CACHED   | PARQUET | false
+| Women       | 1790  | 1      | 226.27KB | NOT CACHED   | PARQUET | true
+| Total       | 17957 | 11     | 2.65MB   | 0B           |         |
++-------------+-------+--------+----------+--------------+---------+------------------
+
+-- After another COMPUTE INCREMENTAL STATS,
+-- all partitions have incremental stats, and only the 2
+-- partitions without incremental stats were scanned.
+compute incremental stats item_partitioned;
++------------------------------------------+
+| summary                                  |
++------------------------------------------+
+| Updated 2 partition(s) and 21 column(s). |
++------------------------------------------+
+show table stats item_partitioned;
++-------------+-------+--------+----------+--------------+---------+------------------
+| i_category  | #Rows | #Files | Size     | Bytes Cached | Format  | 
Incremental stats
++-------------+-------+--------+----------+--------------+---------+------------------
+| Books       | 1733  | 1      | 223.74KB | NOT CACHED   | PARQUET | true
+| Camping     | 5328  | 1      | 408.02KB | NOT CACHED   | PARQUET | true
+| Children    | 1786  | 1      | 230.05KB | NOT CACHED   | PARQUET | true
+| Electronics | 1812  | 1      | 232.67KB | NOT CACHED   | PARQUET | true
+| Home        | 1807  | 1      | 232.56KB | NOT CACHED   | PARQUET | true
+| Jewelry     | 1740  | 1      | 223.72KB | NOT CACHED   | PARQUET | true
+| Men         | 1811  | 1      | 231.25KB | NOT CACHED   | PARQUET | true
+| Music       | 1860  | 1      | 237.90KB | NOT CACHED   | PARQUET | true
+| Shoes       | 1835  | 1      | 234.90KB | NOT CACHED   | PARQUET | true
+| Sports      | 1783  | 1      | 227.97KB | NOT CACHED   | PARQUET | true
+| Women       | 1790  | 1      | 226.27KB | NOT CACHED   | PARQUET | true
+| Total       | 17957 | 11     | 2.65MB   | 0B           |         |
++-------------+-------+--------+----------+--------------+---------+------------------
+</codeblock>
+
+    <p conref="../shared/impala_common.xml#common/file_format_blurb"/>
+
+    <p>
+      The <codeph>COMPUTE STATS</codeph> statement works with tables created 
with any of the file formats supported
+      by Impala. See <xref href="impala_file_formats.xml#file_formats"/> for 
details about working with the
+      different file formats. The following considerations apply to 
<codeph>COMPUTE STATS</codeph> depending on the
+      file format of the table.
+    </p>
+
+    <p>
+      The <codeph>COMPUTE STATS</codeph> statement works with text tables with 
no restrictions. These tables can be
+      created through either Impala or Hive.
+    </p>
+
+    <p>
+      The <codeph>COMPUTE STATS</codeph> statement works with Parquet tables. 
These tables can be created through
+      either Impala or Hive.
+    </p>
+
+    <p>
+      The <codeph>COMPUTE STATS</codeph> statement works with Avro tables 
without restriction in CDH 5.4 / Impala 2.2
+      and higher. In earlier releases, <codeph>COMPUTE STATS</codeph> worked 
only for Avro tables created through Hive,
+      and required the <codeph>CREATE TABLE</codeph> statement to use 
SQL-style column names and types rather than an
+      Avro-style schema specification.
+    </p>
+
+    <p>
+      The <codeph>COMPUTE STATS</codeph> statement works with RCFile tables 
with no restrictions. These tables can
+      be created through either Impala or Hive.
+    </p>
+
+    <p>
+      The <codeph>COMPUTE STATS</codeph> statement works with SequenceFile 
tables with no restrictions. These
+      tables can be created through either Impala or Hive.
+    </p>
+
+    <p>
+      The <codeph>COMPUTE STATS</codeph> statement works with partitioned 
tables, whether all the partitions use
+      the same file format, or some partitions are defined through 
<codeph>ALTER TABLE</codeph> to use different
+      file formats.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/ddl_blurb"/>
+
+    <p conref="../shared/impala_common.xml#common/cancel_blurb_maybe"/>
+
+    <p conref="../shared/impala_common.xml#common/restrictions_blurb"/>
+
+    <p conref="../shared/impala_common.xml#common/decimal_no_stats"/>
+
+    <note conref="../shared/impala_common.xml#common/compute_stats_nulls"/>
+
+    <p conref="../shared/impala_common.xml#common/internals_blurb"/>
+    <p>
+      Behind the scenes, the <codeph>COMPUTE STATS</codeph> statement
+      executes two statements: one to count the rows of each partition
+      in the table (or the entire table if unpartitioned) through the
+      <codeph>COUNT(*)</codeph> function,
+      and another to count the approximate number of distinct values
+      in each column through the <codeph>NDV()</codeph> function.
+      You might see these queries in your monitoring and diagnostic displays.
+      The same factors that affect the performance, scalability, and
+      execution of other queries (such as parallel execution, memory usage,
+      admission control, and timeouts) also apply to the queries run by the
+      <codeph>COMPUTE STATS</codeph> statement.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/permissions_blurb"/>
+    <p rev="CDH-19187">
+      The user ID that the <cmdname>impalad</cmdname> daemon runs under,
+      typically the <codeph>impala</codeph> user, must have read
+      permission for all affected files in the source directory:
+      all files in the case of an unpartitioned table or
+      a partitioned table in the case of <codeph>COMPUTE STATS</codeph>;
+      or all the files in partitions without incremental stats in
+      the case of <codeph>COMPUTE INCREMENTAL STATS</codeph>.
+      It must also have read and execute permissions for all
+      relevant directories holding the data files.
+      (Essentially, <codeph>COMPUTE STATS</codeph> requires the
+      same permissions as the underlying <codeph>SELECT</codeph> queries it 
runs
+      against the table.)
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/related_info"/>
+
+    <p>
+      <xref href="impala_drop_stats.xml#drop_stats"/>, <xref 
href="impala_show.xml#show_table_stats"/>,
+      <xref href="impala_show.xml#show_column_stats"/>, <xref 
href="impala_perf_stats.xml#perf_stats"/>
+    </p>
+  </conbody>
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3be0f122/docs/topics/impala_concepts.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_concepts.xml b/docs/topics/impala_concepts.xml
new file mode 100644
index 0000000..74c1016
--- /dev/null
+++ b/docs/topics/impala_concepts.xml
@@ -0,0 +1,296 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="concepts">
+
+  <title>Impala Concepts and Architecture</title>
+  <titlealts audience="PDF"><navtitle>Concepts and 
Architecture</navtitle></titlealts>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="Concepts"/>
+      <data name="Category" value="Data Analysts"/>
+      <data name="Category" value="Developers"/>
+      <data name="Category" value="Stub Pages"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+    <draft-comment author="-dita-use-conref-target" audience="integrated"
+      conref="../shared/cdh_cm_common.xml#id_dgz_rhr_kv/draft-comment-test"/>
+
+    <p>
+      The following sections provide background information to help you become 
productive using Impala and
+      its features. Where appropriate, the explanations include context to 
help understand how aspects of Impala
+      relate to other technologies you might already be familiar with, such as 
relational database management
+      systems and data warehouses, or other Hadoop components such as Hive, 
HDFS, and HBase.
+    </p>
+
+    <p outputclass="toc"/>
+  </conbody>
+
+<!-- These other topics are waiting to be filled in. Could become subtopics or 
top-level topics depending on the depth of coverage in each case. -->
+
+  <concept id="intro_data_lifecycle" audience="Cloudera">
+
+    <title>Overview of the Data Lifecycle for Impala</title>
+
+    <conbody/>
+  </concept>
+
+  <concept id="intro_etl" audience="Cloudera">
+
+    <title>Overview of the Extract, Transform, Load (ETL) Process for 
Impala</title>
+  <prolog>
+    <metadata>
+      <data name="Category" value="ETL"/>
+      <data name="Category" value="Ingest"/>
+      <data name="Category" value="Concepts"/>
+    </metadata>
+  </prolog>
+
+    <conbody/>
+  </concept>
+
+  <concept id="intro_hadoop_data" audience="Cloudera">
+
+    <title>How Impala Works with Hadoop Data Files</title>
+
+    <conbody/>
+  </concept>
+
+  <concept id="intro_web_ui" audience="Cloudera">
+
+    <title>Overview of the Impala Web Interface</title>
+
+    <conbody/>
+  </concept>
+
+  <concept id="intro_bi" audience="Cloudera">
+
+    <title>Using Impala with Business Intelligence Tools</title>
+
+    <conbody/>
+  </concept>
+
+  <concept id="intro_ha" audience="Cloudera">
+
+    <title>Overview of Impala Availability and Fault Tolerance</title>
+
+    <conbody/>
+  </concept>
+
+<!-- This is pretty much ready to go. Decide if it should go under "Concepts" 
or "Performance",
+     and if it should be split out into a separate file, and then take out the 
audience= attribute
+     to make it visible.
+-->
+
+  <concept id="intro_llvm" audience="Cloudera">
+
+    <title>Overview of Impala Runtime Code Generation</title>
+
+    <conbody>
+
+<!-- Adapted from the CIDR15 paper written by the Impala team. -->
+
+      <p>
+        Impala uses <term>LLVM</term> (a compiler library and collection of 
related tools) to perform just-in-time
+        (JIT) compilation within the running <cmdname>impalad</cmdname> 
process. This runtime code generation
+        technique improves query execution times by generating native code 
optimized for the architecture of each
+        host in your particular cluster. Performance gains of 5 times or more 
are typical for representative
+        workloads.
+      </p>
+
+      <p>
+        Impala uses runtime code generation to produce query-specific versions 
of functions that are critical to
+        performance. In particular, code generation is applied to <term>inner 
loop</term> functions, that is, those
+        that are executed many times (for every tuple) in a given query, and 
thus constitute a large portion of the
+        total time the query takes to execute. For example, when Impala scans 
a data file, it calls a function to
+        parse each record into Impalaâs in-memory tuple format. For queries 
scanning large tables, billions of
+        records could result in billions of function calls. This function must 
therefore be extremely efficient for
+        good query performance, and removing even a few instructions from each 
function call can result in large
+        query speedups.
+      </p>
+
+      <p>
+        Overall, JIT compilation has an effect similar to writing custom code 
to process a query. For example, it
+        eliminates branches, unrolls loops, propagates constants, offsets and 
pointers, and inlines functions.
+        Inlining is especially valuable for functions used internally to 
evaluate expressions, where the function
+        call itself is more expensive than the function body (for example, a 
function that adds two numbers).
+        Inlining functions also increases instruction-level parallelism, and 
allows the compiler to make further
+        optimizations such as subexpression elimination across expressions.
+      </p>
+
+      <p>
+        Impala generates runtime query code automatically, so you do not need 
to do anything special to get this
+        performance benefit. This technique is most effective for complex and 
long-running queries that process
+        large numbers of rows. If you need to issue a series of short, small 
queries, you might turn off this
+        feature to avoid the overhead of compilation time for each query. In 
this case, issue the statement
+        <codeph>SET DISABLE_CODEGEN=true</codeph> to turn off runtime code 
generation for the duration of the
+        current session.
+      </p>
+
+<!--
+      <p>
+        Without code generation,
+        functions tend to be suboptimal
+        to handle situations that cannot be predicted in advance.
+        For example,
+        a record-parsing function that
+        only handles integer types will be faster at parsing an integer-only 
file
+        than a function that handles other data types
+        such as strings and floating-point numbers.
+        However, the schemas of the files to
+        be scanned are unknown at compile time,
+        and so a general-purpose function must be used, even if at runtime
+        it is known that more limited functionality is sufficient.
+      </p>
+
+      <p>
+        A source of large runtime overheads are virtual functions. Virtual 
function calls incur a large performance
+        penalty, particularly when the called function is very simple, as the 
calls cannot be inlined.
+        If the type of the object instance is known at runtime, we can use 
code generation to replace the virtual
+        function call with a call directly to the correct function, which can 
then be inlined. This is especially
+        valuable when evaluating expression trees. In Impala (as in many 
systems), expressions are composed of a
+        tree of individual operators and functions.
+      </p>
+
+      <p>
+        Each type of expression that can appear in a query is implemented 
internally by overriding a virtual function.
+        Many of these expression functions are quite simple, for example, 
adding two numbers.
+        The virtual function call can be more expensive than the function body 
itself. By resolving the virtual
+        function calls with code generation and then inlining the resulting 
function calls, Impala can evaluate expressions
+        directly with no function call overhead. Inlining functions also 
increases
+        instruction-level parallelism, and allows the compiler to make further 
optimizations such as subexpression
+        elimination across expressions.
+      </p>
+-->
+    </conbody>
+  </concept>
+
+<!-- Same as the previous section: adapted from CIDR paper, ready to 
externalize after deciding where to go. -->
+
+  <concept audience="Cloudera" id="intro_io">
+
+    <title>Overview of Impala I/O</title>
+
+    <conbody>
+
+      <p>
+        Efficiently retrieving data from HDFS is a challenge for all 
SQL-on-Hadoop systems. To perform
+        data scans from both disk and memory at or near hardware speed, Impala 
uses an HDFS feature called
+        <term>short-circuit local reads</term> to bypass the DataNode protocol 
when reading from local disk. Impala
+        can read at almost disk bandwidth (approximately 100 MB/s per disk) 
and is typically able to saturate all
+        available disks. For example, with 12 disks, Impala is typically 
capable of sustaining I/O at 1.2 GB/sec.
+        Furthermore, <term>HDFS caching</term> allows Impala to access 
memory-resident data at memory bus speed,
+        and saves CPU cycles as there is no need to copy or checksum data 
blocks within memory.
+      </p>
+
+      <p>
+        The I/O manager component interfaces with storage devices to read and 
write data. I/O manager assigns a
+        fixed number of worker threads per physical disk (currently one thread 
per rotational disk and eight per
+        SSD), providing an asynchronous interface to clients (<term>scanner 
threads</term>).
+      </p>
+    </conbody>
+  </concept>
+
+<!-- Same as the previous section: adapted from CIDR paper, ready to 
externalize after deciding where to go. -->
+
+<!-- Although good idea to get some answers from Henry first. -->
+
+  <concept audience="Cloudera" id="intro_state_distribution">
+
+    <title>State distribution</title>
+
+    <conbody>
+
+      <p>
+        As a massively parallel database that can run on hundreds of nodes, 
Impala must coordinate and synchronize
+        its metadata across the entire cluster. Impala's symmetric-node 
architecture means that any node can accept
+        and execute queries, and thus each node needs up-to-date versions of 
the system catalog and a knowledge of
+        which hosts the <cmdname>impalad</cmdname> daemons run on. To avoid 
the overhead of TCP connections and
+        remote procedure calls to retrieve metadata during query planning, 
Impala implements a simple
+        publish-subscribe service called the <term>statestore</term> to push 
metadata changes to a set of
+        subscribers (the <cmdname>impalad</cmdname> daemons running on all the 
DataNodes).
+      </p>
+
+      <p>
+        The statestore maintains a set of topics, which are arrays of 
<codeph>(<varname>key</varname>,
+        <varname>value</varname>, <varname>version</varname>)</codeph> 
triplets called <term>entries</term> where
+        <varname>key</varname> and <varname>value</varname> are byte arrays, 
and <varname>version</varname> is a
+        64-bit integer. A topic is defined by an application, and so the 
statestore has no understanding of the
+        contents of any topic entry. Topics are persistent through the 
lifetime of the statestore, but are not
+        persisted across service restarts. Processes that receive updates to 
any topic are called
+        <term>subscribers</term>, and express their interest by registering 
with the statestore at startup and
+        providing a list of topics. The statestore responds to registration by 
sending the subscriber an initial
+        topic update for each registered topic, which consists of all the 
entries currently in that topic.
+      </p>
+
+<!-- Henry: OK, but in practice, what is in these topic messages for Impala? 
-->
+
+      <p>
+        After registration, the statestore periodically sends two kinds of 
messages to each subscriber. The first
+        kind of message is a topic update, and consists of all changes to a 
topic (new entries, modified entries
+        and deletions) since the last update was successfully sent to the 
subscriber. Each subscriber maintains a
+        per-topic most-recent-version identifier which allows the statestore 
to only send the delta between
+        updates. In response to a topic update, each subscriber sends a list 
of changes it intends to make to its
+        subscribed topics. Those changes are guaranteed to have been applied 
by the time the next update is
+        received.
+      </p>
+
+      <p>
+        The second kind of statestore message is a <term>heartbeat</term>, 
formerly sometimes called
+        <term>keepalive</term>. The statestore uses heartbeat messages to 
maintain the connection to each
+        subscriber, which would otherwise time out its subscription and 
attempt to re-register.
+      </p>
+
+      <p>
+        Prior to Impala 2.0, both kinds of communication were combined in a 
single kind of message. Because these
+        messages could be very large in instances with thousands of tables, 
partitions, data files, and so on,
+        Impala 2.0 and higher divides the types of messages so that the small 
heartbeat pings can be transmitted
+        and acknowledged quickly, increasing the reliability of the statestore 
mechanism that detects when Impala
+        nodes become unavailable.
+      </p>
+
+      <p>
+        If the statestore detects a failed subscriber (for example, by 
repeated failed heartbeat deliveries), it
+        stops sending updates to that node.
+<!-- Henry: what are examples of these transient topic entries? -->
+        Some topic entries are marked as transient, meaning that if their 
owning subscriber fails, they are
+        removed.
+      </p>
+
+      <p>
+        Although the asynchronous nature of this mechanism means that metadata 
updates might take some time to
+        propagate across the entire cluster, that does not affect the 
consistency of query planning or results.
+        Each query is planned and coordinated by a particular node, so as long 
as the coordinator node is aware of
+        the existence of the relevant tables, data files, and so on, it can 
distribute the query work to other
+        nodes even if those other nodes have not received the latest metadata 
updates.
+<!-- Henry: need another example here of what's in a topic, e.g. is it the 
list of available tables? -->
+<!--
+        For example, query planning is performed on a single node based on the
+        catalog metadata topic, and once a full plan has been computed, all 
information required to execute that
+        plan is distributed directly to the executing nodes.
+        There is no requirement that an executing node should
+        know about the same version of the catalog metadata topic.
+-->
+      </p>
+
+      <p>
+        We have found that the statestore process with default settings scales 
well to medium sized clusters, and
+        can serve our largest deployments with some configuration changes.
+<!-- Henry: elaborate on the configuration changes. -->
+      </p>
+
+      <p>
+<!-- Henry: other examples like load information? How is load information 
used? -->
+        The statestore does not persist any metadata to disk: all current 
metadata is pushed to the statestore by
+        its subscribers (for example, load information). Therefore, should a 
statestore restart, its state can be
+        recovered during the initial subscriber registration phase. Or if the 
machine that the statestore is
+        running on fails, a new statestore process can be started elsewhere, 
and subscribers can fail over to it.
+        There is no built-in failover mechanism in Impala, instead deployments 
commonly use a retargetable DNS
+        entry to force subscribers to automatically move to the new process 
instance.
+<!-- Henry: translate that last sentence into instructions / guidelines. -->
+      </p>
+    </conbody>
+  </concept>
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3be0f122/docs/topics/impala_conditional_functions.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_conditional_functions.xml 
b/docs/topics/impala_conditional_functions.xml
new file mode 100644
index 0000000..23de779
--- /dev/null
+++ b/docs/topics/impala_conditional_functions.xml
@@ -0,0 +1,443 @@
+<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE concept PUBLIC "-//OASIS//DTD 
DITA Concept//EN" "concept.dtd">
+<concept id="conditional_functions">
+
+  <title>Impala Conditional Functions</title>
+  <titlealts audience="PDF"><navtitle>Conditional 
Functions</navtitle></titlealts>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="Impala Functions"/>
+      <data name="Category" value="SQL"/>
+      <data name="Category" value="Data Analysts"/>
+      <data name="Category" value="Developers"/>
+      <data name="Category" value="Querying"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      Impala supports the following conditional functions for testing 
equality, comparison operators, and nullity:
+    </p>
+
+    <dl>
+      <dlentry id="case">
+
+        <dt>
+          <codeph>CASE a WHEN b THEN c [WHEN d THEN e]... [ELSE f] END</codeph>
+        </dt>
+
+        <dd>
+          <indexterm audience="Cloudera">CASE expression</indexterm>
+          <b>Purpose:</b> Compares an expression to one or more possible 
values, and returns a corresponding result
+          when a match is found.
+          <p conref="../shared/impala_common.xml#common/return_same_type"/>
+          <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/>
+          <p>
+            In this form of the <codeph>CASE</codeph> expression, the initial 
value <codeph>A</codeph>
+            being evaluated for each row it typically a column reference, or 
an expression involving
+            a column. This form can only compare against a set of specified 
values, not ranges,
+            multi-value comparisons such as <codeph>BETWEEN</codeph> or 
<codeph>IN</codeph>,
+            regular expressions, or <codeph>NULL</codeph>.
+          </p>
+          <p conref="../shared/impala_common.xml#common/example_blurb"/>
+          <p>
+            Although this example is split across multiple lines, you can put 
any or all parts of a <codeph>CASE</codeph> expression
+            on a single line, with no punctuation or other separators between 
the <codeph>WHEN</codeph>,
+            <codeph>ELSE</codeph>, and <codeph>END</codeph> clauses.
+          </p>
+<codeblock>select case x
+    when 1 then 'one'
+    when 2 then 'two'
+    when 0 then 'zero'
+    else 'out of range'
+  end
+    from t1;
+</codeblock>
+        </dd>
+
+      </dlentry>
+
+      <dlentry id="case2">
+
+        <dt>
+          <codeph>CASE WHEN a THEN b [WHEN c THEN d]... [ELSE e] END</codeph>
+        </dt>
+
+        <dd>
+          <indexterm audience="Cloudera">CASE expression</indexterm>
+          <b>Purpose:</b> Tests whether any of a sequence of expressions is 
true, and returns a corresponding
+          result for the first true expression.
+          <p conref="../shared/impala_common.xml#common/return_same_type"/>
+          <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/>
+          <p>
+            <codeph>CASE</codeph> expressions without an initial test value 
have more flexibility.
+            For example, they can test different columns in different 
<codeph>WHEN</codeph> clauses,
+            or use comparison operators such as <codeph>BETWEEN</codeph>, 
<codeph>IN</codeph> and <codeph>IS NULL</codeph>
+            rather than comparing against discrete values.
+          </p>
+          <p>
+            <codeph>CASE</codeph> expressions are often the foundation of long 
queries that
+            summarize and format results for easy-to-read reports. For 
example, you might
+            use a <codeph>CASE</codeph> function call to turn values from a 
numeric column
+            into category strings corresponding to integer values, or labels 
such as <q>Small</q>,
+            <q>Medium</q> and <q>Large</q> based on ranges. Then subsequent 
parts of the
+            query might aggregate based on the transformed values, such as how 
many
+            values are classified as small, medium, or large. You can also use 
<codeph>CASE</codeph>
+            to signal problems with out-of-bounds values, 
<codeph>NULL</codeph> values,
+            and so on.
+          </p>
+          <p>
+            By using operators such as <codeph>OR</codeph>, 
<codeph>IN</codeph>,
+            <codeph>REGEXP</codeph>, and so on in <codeph>CASE</codeph> 
expressions,
+            you can build extensive tests and transformations into a single 
query.
+            Therefore, applications that construct SQL statements often rely 
heavily on <codeph>CASE</codeph>
+            calls in the generated SQL code.
+          </p>
+          <p>
+            Because this flexible form of the <codeph>CASE</codeph> 
expressions allows you to perform
+            many comparisons and call multiple functions when evaluating each 
row, be careful applying
+            elaborate <codeph>CASE</codeph> expressions to queries that 
process large amounts of data.
+            For example, when practical, evaluate and transform values through 
<codeph>CASE</codeph>
+            after applying operations such as aggregations that reduce the 
size of the result set;
+            transform numbers to strings after performing joins with the 
original numeric values.
+          </p>
+          <p conref="../shared/impala_common.xml#common/example_blurb"/>
+          <p>
+            Although this example is split across multiple lines, you can put 
any or all parts of a <codeph>CASE</codeph> expression
+            on a single line, with no punctuation or other separators between 
the <codeph>WHEN</codeph>,
+            <codeph>ELSE</codeph>, and <codeph>END</codeph> clauses.
+          </p>
+<codeblock>select case
+    when dayname(now()) in ('Saturday','Sunday') then 'result undefined on 
weekends'
+    when x > y then 'x greater than y'
+    when x = y then 'x and y are equal'
+    when x is null or y is null then 'one of the columns is null'
+    else null
+  end
+    from t1;
+</codeblock>
+        </dd>
+
+      </dlentry>
+
+      <dlentry id="coalesce">
+
+        <dt>
+          <codeph>coalesce(type v1, type v2, ...)</codeph>
+        </dt>
+
+        <dd>
+          <indexterm audience="Cloudera">coalesce() function</indexterm>
+          <b>Purpose:</b> Returns the first specified argument that is not 
<codeph>NULL</codeph>, or
+          <codeph>NULL</codeph> if all arguments are <codeph>NULL</codeph>.
+          <p conref="../shared/impala_common.xml#common/return_same_type"/>
+        </dd>
+
+      </dlentry>
+
+      <dlentry rev="2.0.0" id="decode">
+
+        <dt>
+          <codeph>decode(type expression, type search1, type result1 [, type 
search2, type result2 ...] [, type
+          default] )</codeph>
+        </dt>
+
+        <dd>
+          <indexterm audience="Cloudera">decode() function</indexterm>
+          <b>Purpose:</b> Compares an expression to one or more possible 
values, and returns a corresponding result
+          when a match is found.
+          <p conref="../shared/impala_common.xml#common/return_same_type"/>
+          <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/>
+          <p>
+            Can be used as shorthand for a <codeph>CASE</codeph> expression.
+          </p>
+          <p>
+            The original expression and the search expressions must of the 
same type or convertible types. The
+            result expression can be a different type, but all result 
expressions must be of the same type.
+          </p>
+          <p>
+            Returns a successful match If the original expression is 
<codeph>NULL</codeph> and a search expression
+            is also <codeph>NULL</codeph>. the
+          </p>
+          <p>
+            Returns <codeph>NULL</codeph> if the final 
<codeph>default</codeph> value is omitted and none of the
+            search expressions match the original expression.
+          </p>
+          <p conref="../shared/impala_common.xml#common/example_blurb"/>
+          <p>
+            The following example translates numeric day values into 
descriptive names:
+          </p>
+<codeblock>SELECT event, decode(day_of_week, 1, "Monday", 2, "Tuesday", 3, 
"Wednesday",
+  4, "Thursday", 5, "Friday", 6, "Saturday", 7, "Sunday", "Unknown day")
+  FROM calendar;
+</codeblock>
+        </dd>
+
+      </dlentry>
+
+      <dlentry id="if">
+
+        <dt>
+          <codeph>if(boolean condition, type ifTrue, type 
ifFalseOrNull)</codeph>
+        </dt>
+
+        <dd>
+          <indexterm audience="Cloudera">if() function</indexterm>
+          <b>Purpose:</b> Tests an expression and returns a corresponding 
result depending on whether the result is
+          true, false, or <codeph>NULL</codeph>.
+          <p>
+            <b>Return type:</b> Same as the <codeph>ifTrue</codeph> argument 
value
+          </p>
+        </dd>
+
+      </dlentry>
+
+      <dlentry rev="1.3.0" id="ifnull">
+
+        <dt>
+          <codeph>ifnull(type a, type ifNull)</codeph>
+        </dt>
+
+        <dd>
+          <indexterm audience="Cloudera">isnull() function</indexterm>
+          <b>Purpose:</b> Alias for the <codeph>isnull()</codeph> function, 
with the same behavior. To simplify
+          porting SQL with vendor extensions to Impala.
+          <p conref="../shared/impala_common.xml#common/added_in_130"/>
+        </dd>
+
+      </dlentry>
+
+      <dlentry id="isfalse" rev="2.2.0">
+
+        <dt>
+          <codeph>isfalse(<varname>boolean</varname>)</codeph>
+        </dt>
+
+        <dd>
+          <indexterm audience="Cloudera">isfalse() function</indexterm>
+          <b>Purpose:</b> Tests if a Boolean expression is 
<codeph>false</codeph> or not.
+          Returns <codeph>true</codeph> if so.
+          If the argument is <codeph>NULL</codeph>, returns 
<codeph>false</codeph>.
+          Identical to <codeph>isnottrue()</codeph>, except it returns the 
opposite value for a <codeph>NULL</codeph> argument.
+          <p conref="../shared/impala_common.xml#common/return_type_boolean"/>
+          <p conref="../shared/impala_common.xml#common/added_in_220"/>
+        </dd>
+
+      </dlentry>
+
+      <dlentry id="isnotfalse" rev="2.2.0">
+
+        <dt>
+          <codeph>isnotfalse(<varname>boolean</varname>)</codeph>
+        </dt>
+
+        <dd>
+          <indexterm audience="Cloudera">isnotfalse() function</indexterm>
+          <b>Purpose:</b> Tests if a Boolean expression is not 
<codeph>false</codeph> (that is, either <codeph>true</codeph> or 
<codeph>NULL</codeph>).
+          Returns <codeph>true</codeph> if so.
+          If the argument is <codeph>NULL</codeph>, returns 
<codeph>true</codeph>.
+          Identical to <codeph>istrue()</codeph>, except it returns the 
opposite value for a <codeph>NULL</codeph> argument.
+          <p conref="../shared/impala_common.xml#common/return_type_boolean"/>
+          <p 
conref="../shared/impala_common.xml#common/for_compatibility_only"/>
+          <p conref="../shared/impala_common.xml#common/added_in_220"/>
+        </dd>
+
+      </dlentry>
+
+      <dlentry id="isnottrue" rev="2.2.0">
+
+        <dt>
+          <codeph>isnottrue(<varname>boolean</varname>)</codeph>
+        </dt>
+
+        <dd>
+          <indexterm audience="Cloudera">isnottrue() function</indexterm>
+          <b>Purpose:</b> Tests if a Boolean expression is not 
<codeph>true</codeph> (that is, either <codeph>false</codeph> or 
<codeph>NULL</codeph>).
+          Returns <codeph>true</codeph> if so.
+          If the argument is <codeph>NULL</codeph>, returns 
<codeph>true</codeph>.
+          Identical to <codeph>isfalse()</codeph>, except it returns the 
opposite value for a <codeph>NULL</codeph> argument.
+          <p conref="../shared/impala_common.xml#common/return_type_boolean"/>
+          <p conref="../shared/impala_common.xml#common/added_in_220"/>
+        </dd>
+
+      </dlentry>
+
+      <dlentry id="isnull">
+
+        <dt>
+          <codeph>isnull(type a, type ifNull)</codeph>
+        </dt>
+
+        <dd>
+          <indexterm audience="Cloudera">isnull() function</indexterm>
+          <b>Purpose:</b> Tests if an expression is <codeph>NULL</codeph>, and 
returns the expression result value
+          if not. If the first argument is <codeph>NULL</codeph>, returns the 
second argument.
+          <p>
+            <b>Compatibility notes:</b> Equivalent to the 
<codeph>nvl()</codeph> function from Oracle Database or
+            <codeph>ifnull()</codeph> from MySQL. The <codeph>nvl()</codeph> 
and <codeph>ifnull()</codeph>
+            functions are also available in Impala.
+          </p>
+          <p>
+            <b>Return type:</b> Same as the first argument value
+          </p>
+        </dd>
+
+      </dlentry>
+
+      <dlentry id="istrue" rev="2.2.0">
+
+        <dt>
+          <codeph>istrue(<varname>boolean</varname>)</codeph>
+        </dt>
+
+        <dd>
+          <indexterm audience="Cloudera">istrue() function</indexterm>
+          <b>Purpose:</b> Tests if a Boolean expression is 
<codeph>true</codeph> or not.
+          Returns <codeph>true</codeph> if so.
+          If the argument is <codeph>NULL</codeph>, returns 
<codeph>false</codeph>.
+          Identical to <codeph>isnotfalse()</codeph>, except it returns the 
opposite value for a <codeph>NULL</codeph> argument.
+          <p conref="../shared/impala_common.xml#common/return_type_boolean"/>
+          <p 
conref="../shared/impala_common.xml#common/for_compatibility_only"/>
+          <p conref="../shared/impala_common.xml#common/added_in_220"/>
+        </dd>
+
+      </dlentry>
+
+      <dlentry id="nonnullvalue" rev="2.2.0">
+
+        <dt>
+          <codeph>nonnullvalue(<varname>expression</varname>)</codeph>
+        </dt>
+
+        <dd>
+          <indexterm audience="Cloudera">function</indexterm>
+          <b>Purpose:</b> Tests if an expression (of any type) is 
<codeph>NULL</codeph> or not.
+          Returns <codeph>false</codeph> if so.
+          The converse of <codeph>nullvalue()</codeph>.
+          <p conref="../shared/impala_common.xml#common/return_type_boolean"/>
+          <p 
conref="../shared/impala_common.xml#common/for_compatibility_only"/>
+          <p conref="../shared/impala_common.xml#common/added_in_220"/>
+        </dd>
+
+      </dlentry>
+
+      <dlentry rev="1.3.0" id="nullif">
+
+        <dt>
+          
<codeph>nullif(<varname>expr1</varname>,<varname>expr2</varname>)</codeph>
+        </dt>
+
+        <dd>
+          <indexterm audience="Cloudera">nullif() function</indexterm>
+          <b>Purpose:</b> Returns <codeph>NULL</codeph> if the two specified 
arguments are equal. If the specified
+          arguments are not equal, returns the value of 
<varname>expr1</varname>. The data types of the expressions
+          must be compatible, according to the conversion rules from <xref 
href="impala_datatypes.xml#datatypes"/>.
+          You cannot use an expression that evaluates to <codeph>NULL</codeph> 
for <varname>expr1</varname>; that
+          way, you can distinguish a return value of <codeph>NULL</codeph> 
from an argument value of
+          <codeph>NULL</codeph>, which would never match 
<varname>expr2</varname>.
+          <p>
+            <b>Usage notes:</b> This function is effectively shorthand for a 
<codeph>CASE</codeph> expression of
+            the form:
+          </p>
+<codeblock>CASE
+  WHEN <varname>expr1</varname> = <varname>expr2</varname> THEN NULL
+  ELSE <varname>expr1</varname>
+END</codeblock>
+          <p>
+            It is commonly used in division expressions, to produce a 
<codeph>NULL</codeph> result instead of a
+            divide-by-zero error when the divisor is equal to zero:
+          </p>
+<codeblock>select 1.0 / nullif(c1,0) as reciprocal from t1;</codeblock>
+          <p>
+            You might also use it for compatibility with other database 
systems that support the same
+            <codeph>NULLIF()</codeph> function.
+          </p>
+          <p conref="../shared/impala_common.xml#common/return_same_type"/>
+          <p conref="../shared/impala_common.xml#common/added_in_130"/>
+        </dd>
+
+      </dlentry>
+
+      <dlentry rev="1.3.0" id="nullifzero">
+
+        <dt>
+          <codeph>nullifzero(<varname>numeric_expr</varname>)</codeph>
+        </dt>
+
+        <dd>
+          <indexterm audience="Cloudera">nullifzero() function</indexterm>
+          <b>Purpose:</b> Returns <codeph>NULL</codeph> if the numeric 
expression evaluates to 0, otherwise returns
+          the result of the expression.
+          <p>
+            <b>Usage notes:</b> Used to avoid error conditions such as 
divide-by-zero in numeric calculations.
+            Serves as shorthand for a more elaborate <codeph>CASE</codeph> 
expression, to simplify porting SQL with
+            vendor extensions to Impala.
+          </p>
+          <p conref="../shared/impala_common.xml#common/return_same_type"/>
+          <p conref="../shared/impala_common.xml#common/added_in_130"/>
+        </dd>
+
+      </dlentry>
+
+      <dlentry id="nullvalue" rev="2.2.0">
+
+        <dt>
+          <codeph>nullvalue(<varname>expression</varname>)</codeph>
+        </dt>
+
+        <dd>
+          <indexterm audience="Cloudera">function</indexterm>
+          <b>Purpose:</b> Tests if an expression (of any type) is 
<codeph>NULL</codeph> or not.
+          Returns <codeph>true</codeph> if so.
+          The converse of <codeph>nonnullvalue()</codeph>.
+          <p conref="../shared/impala_common.xml#common/return_type_boolean"/>
+          <p 
conref="../shared/impala_common.xml#common/for_compatibility_only"/>
+          <p conref="../shared/impala_common.xml#common/added_in_220"/>
+        </dd>
+
+      </dlentry>
+
+      <dlentry id="nvl" rev="1.1">
+
+        <dt>
+          <codeph>nvl(type a, type ifNull)</codeph>
+        </dt>
+
+        <dd>
+          <indexterm audience="Cloudera">nvl() function</indexterm>
+          <b>Purpose:</b> Alias for the <codeph>isnull()</codeph> function. 
Tests if an expression is
+          <codeph>NULL</codeph>, and returns the expression result value if 
not. If the first argument is
+          <codeph>NULL</codeph>, returns the second argument. Equivalent to 
the <codeph>nvl()</codeph> function
+          from Oracle Database or <codeph>ifnull()</codeph> from MySQL.
+          <p>
+            <b>Return type:</b> Same as the first argument value
+          </p>
+          <p conref="../shared/impala_common.xml#common/added_in_11"/>
+        </dd>
+
+      </dlentry>
+
+      <dlentry rev="1.3.0" id="zeroifnull">
+
+        <dt>
+          <codeph>zeroifnull(<varname>numeric_expr</varname>)</codeph>
+        </dt>
+
+        <dd>
+          <indexterm audience="Cloudera">zeroifnull() function</indexterm>
+          <b>Purpose:</b> Returns 0 if the numeric expression evaluates to 
<codeph>NULL</codeph>, otherwise returns
+          the result of the expression.
+          <p>
+            <b>Usage notes:</b> Used to avoid unexpected results due to 
unexpected propagation of
+            <codeph>NULL</codeph> values in numeric calculations. Serves as 
shorthand for a more elaborate
+            <codeph>CASE</codeph> expression, to simplify porting SQL with 
vendor extensions to Impala.
+          </p>
+          <p conref="../shared/impala_common.xml#common/return_same_type"/>
+          <p conref="../shared/impala_common.xml#common/added_in_130"/>
+        </dd>
+
+      </dlentry>
+    </dl>
+  </conbody>
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3be0f122/docs/topics/impala_config.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_config.xml b/docs/topics/impala_config.xml
new file mode 100644
index 0000000..7ea82e5
--- /dev/null
+++ b/docs/topics/impala_config.xml
@@ -0,0 +1,57 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="config">
+
+  <title>Managing Impala</title>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="Administrators"/>
+      <data name="Category" value="Configuring"/>
+      <data name="Category" value="JDBC"/>
+      <data name="Category" value="ODBC"/>
+      <data name="Category" value="Stub Pages"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      This section explains how to configure Impala to accept connections from 
applications that use popular
+      programming APIs:
+    </p>
+
+    <ul>
+      <li>
+        <xref href="impala_config_performance.xml#config_performance"/>
+      </li>
+
+      <li>
+        <xref href="impala_odbc.xml#impala_odbc"/>
+      </li>
+
+      <li>
+        <xref href="impala_jdbc.xml#impala_jdbc"/>
+      </li>
+    </ul>
+
+    <p>
+      This type of configuration is especially useful when using Impala in 
combination with Business Intelligence
+      tools, which use these standard interfaces to query different kinds of 
database and Big Data systems.
+    </p>
+
+    <p>
+      You can also configure these other aspects of Impala:
+    </p>
+
+    <ul>
+      <li>
+        <xref href="impala_security.xml#security"/>
+      </li>
+
+      <li>
+        <xref href="impala_config_options.xml#config_options"/>
+      </li>
+    </ul>
+  </conbody>
+</concept>

[39/51] [partial] incubator-impala git commit: IMPALA-3398: Add docs to main Impala branch.

Reply via email to