[18/51] [partial] incubator-impala git commit: IMPALA-3398: Add docs to main Impala branch.

jbapple Thu, 17 Nov 2016 15:12:27 -0800

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3be0f122/docs/topics/impala_new_features.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_new_features.xml 
b/docs/topics/impala_new_features.xml
new file mode 100644
index 0000000..4da811f
--- /dev/null
+++ b/docs/topics/impala_new_features.xml
@@ -0,0 +1,4015 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept rev="ver" id="new_features">
+
+  <title><ph audience="standalone">New Features in Apache Impala 
(incubating)</ph><ph audience="integrated">What's New in Apache Impala 
(incubating)</ph></title>
+
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="Release Notes"/>
+      <data name="Category" value="New Features"/>
+      <data name="Category" value="What's New"/>
+      <data name="Category" value="Getting Started"/>
+      <data name="Category" value="Upgrading"/>
+      <data name="Category" value="Administrators"/>
+      <data name="Category" value="Developers"/>
+      <data name="Category" value="Data Analysts"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      This release of Impala contains the following changes and enhancements 
from previous releases.
+    </p>
+
+    <p outputclass="toc inpage"/>
+
+  </conbody>
+
+<!-- All 2.7.x new features go under here -->
+
+  <concept rev="2.7.0" id="new_features_270">
+
+    <title>New Features in Impala 2.7.x / CDH 5.9.x</title>
+
+    <conbody>
+
+      <ul id="feature_list">
+        <li>
+          <p>
+            Performance improvements:
+          </p>
+          <ul>
+            <li>
+              <p rev="IMPALA-3206 CDH-43744">
+                [<xref href="https://issues.cloudera.org/browse/IMPALA-3206"; 
scope="external" format="html">IMPALA-3206</xref>]
+                Speedup for queries against <codeph>DECIMAL</codeph> columns 
in Avro tables.
+                The code that parses <codeph>DECIMAL</codeph> values from Avro 
now uses
+                native code generation.
+              </p>
+            </li>
+            <li>
+              <p rev="IMPALA-3674">
+                [<xref href="https://issues.cloudera.org/browse/IMPALA-3674"; 
scope="external" format="html">IMPALA-3674</xref>]
+                Improved efficiency in LLVM code generation can reduce codegen 
time, especially
+                for short queries.
+              </p>
+            </li>
+            <!-- Not actually a new feature, it's more a tip about when to 
expect remote reads and how to minimize them. To go somewhere in the 
performance / best practices / Parquet info.
+            <li>
+              <p rev="IMPALA-3885 CDH-43793">
+                [<xref href="https://issues.cloudera.org/browse/IMPALA-3885"; 
scope="external" format="html">IMPALA-3885</xref>]
+                Parquet files with multiple blocks can now be processed
+                without remote reads.
+              </p>
+            </li>
+            -->
+            <li>
+              <p rev="IMPALA-2979 CDH-43739"> [<xref
+                  href="https://issues.cloudera.org/browse/IMPALA-2979"; 
scope="external"
+                  format="html">IMPALA-2979</xref>] Improvements to scheduling 
on worker nodes,
+                enabled by the <codeph>REPLICA_PREFERENCE</codeph> query 
option.
+                See <xref
+                  href="impala_replica_preference.xml#replica_preference"/> 
for details.
+              </p>
+            </li>
+          </ul>
+        </li>
+        <li audience="Cloudera">
+          <p rev="IMPALA-3210 CDH-43736"><!-- Patch didn't make it into in 
<keyword keyref="impala27_full"/> -->
+            [<xref href="https://issues.cloudera.org/browse/IMPALA-3210"; 
scope="external" format="html">IMPALA-3210</xref>]
+            The analytic functions <codeph>FIRST_VALUE()</codeph> and 
<codeph>LAST_VALUE()</codeph>
+            accept a new clause, <codeph>IGNORE NULLS</codeph>.
+            See <xref href="impala_analytic_functions.xml#first_value"/>
+            and <xref href="impala_analytic_functions.xml#last_value"/>
+            for details.
+          </p>
+        </li>
+        <li>
+          <p rev="IMPALA-1683 CDH-43732">
+            [<xref href="https://issues.cloudera.org/browse/IMPALA-1683"; 
scope="external" format="html">IMPALA-1683</xref>]
+            The <codeph>REFRESH</codeph> statement can be applied to a single 
partition,
+            rather than the entire table. See <xref 
href="impala_refresh.xml#refresh"/>
+            and <xref href="impala_partitioning.xml#partition_refresh"/> for 
details.
+          </p>
+        </li>
+        <li>
+          <p>
+            Improvements to the Impala web user interface:
+          </p>
+          <ul>
+            <li>
+              <p rev="IMPALA-2767 CDH-43748">
+                [<xref href="https://issues.cloudera.org/browse/IMPALA-2767"; 
scope="external" format="html">IMPALA-2767</xref>]
+                You can now force a session to expire by clicking a link in 
the web UI,
+                on the <uicontrol>/sessions</uicontrol> tab.
+              </p>
+            </li>
+            <li>
+              <p rev="IMPALA-3715 CDH-43743">
+                [<xref href="https://issues.cloudera.org/browse/IMPALA-3715"; 
scope="external" format="html">IMPALA-3715</xref>]
+                The <uicontrol>/memz</uicontrol> tab includes more information 
about
+                Impala memory usage.
+              </p>
+            </li>
+            <li>
+              <p rev="IMPALA-3716 CDH-43741">
+                [<xref href="https://issues.cloudera.org/browse/IMPALA-3716"; 
scope="external" format="html">IMPALA-3716</xref>]
+                The <uicontrol>Details</uicontrol> page for a query now 
includes
+                a <uicontrol>Memory</uicontrol> tab.
+              </p>
+            </li>
+          </ul>
+        </li>
+        <li>
+          <p rev="IMPALA-3499 CDH-43740">
+            [<xref href="https://issues.cloudera.org/browse/IMPALA-3499"; 
scope="external" format="html">IMPALA-3499</xref>]
+            Scalability improvements to the catalog server. Impala handles 
internal communication
+            more efficiently for tables with large numbers of columns and 
partitions, where the
+            size of the metadata exceeds 2 GiB.
+          </p>
+        </li>
+        <li>
+          <p rev="IMPALA-3677 CDH-43745">
+            [<xref href="https://issues.cloudera.org/browse/IMPALA-3677"; 
scope="external" format="html">IMPALA-3677</xref>]
+            You can send a <codeph>SIGUSR1</codeph> signal to any 
Impala-related daemon to write a
+            Breakpad minidump. For advanced troubleshooting, you can now 
produce a minidump
+            without triggering a crash. See <xref 
href="impala_breakpad.xml#breakpad"/> for
+            details about the Breakpad minidump feature.
+          </p>
+        </li>
+        <li>
+          <p rev="IMPALA-3687 CDH-43731">
+            [<xref href="https://issues.cloudera.org/browse/IMPALA-3687"; 
scope="external" format="html">IMPALA-3687</xref>]
+            The schema reconciliation rules for Avro tables have changed 
slightly
+            for <codeph>CHAR</codeph> and <codeph>VARCHAR</codeph> columns. 
Now, if
+            the definition of such a column is changed in the Avro schema file,
+            the column retains its <codeph>CHAR</codeph> or 
<codeph>VARCHAR</codeph>
+            type as specified in the SQL definition, but the column name and 
comment
+            from the Avro schema file take precedence.
+            See <xref href="impala_avro.xml#avro_create_table"/> for details 
about
+            column definitions in Avro tables.
+          </p>
+        </li>
+        <li audience="Cloudera"><!-- Patch didn't make it into in <keyword 
keyref="impala27_full"/> -->
+          <p rev="IMPALA-1654 CDH-43747">
+            [<xref href="https://issues.cloudera.org/browse/IMPALA-1654"; 
scope="external" format="html">IMPALA-1654</xref>]
+            Several kinds of DDL operations
+            can now work on a range of partitions. The partitions can be 
specified
+            using operators such as <codeph>&lt;</codeph>, 
<codeph>&gt;=</codeph>, and
+            <codeph>!=</codeph> rather than just an equality predicate 
applying to a single
+            partition.
+            This new feature extends the syntax of
+            several clauses
+            of the <codeph>ALTER TABLE</codeph> statement
+            (<codeph>DROP PARTITION</codeph>, <codeph>SET [UN]CACHED</codeph>,
+            <codeph>SET FILEFORMAT | SERDEPROPERTIES | TBLPROPERTIES</codeph>),
+            the <codeph>SHOW FILES</codeph> statement, and the
+            <codeph>COMPUTE INCREMENTAL STATS</codeph> statement.
+            It does not apply to statements that are defined to only apply to 
a single
+            partition, such as <codeph>LOAD DATA</codeph>, <codeph>ALTER TABLE 
... ADD PARTITION</codeph>,
+            <codeph>SET LOCATION</codeph>, and <codeph>INSERT</codeph> with a 
static
+            partitioning clause.
+          </p>
+        </li>
+        <li>
+          <p rev="IMPALA-3575 CDH-43742"> [<xref
+              href="https://issues.cloudera.org/browse/IMPALA-3575";
+              scope="external" format="html">IMPALA-3575</xref>] Some network
+            operations now have additional timeout and retry settings. The 
extra
+            configuration helps avoid failed queries for transient network
+            problems, to avoid hangs when a sender or receiver fails in the
+            middle of a network transmission, and to make cancellation requests
+            more reliable despite network issues. </p>
+        </li>
+      </ul>
+
+    </conbody>
+  </concept>
+<!-- All 2.6.x new features go under here -->
+
+  <concept rev="2.6.0" id="new_features_260">
+
+    <title>New Features in Impala 2.6.x / CDH 5.8.x</title>
+
+    <conbody>
+
+      <!-- <note conref="../shared/impala_common.xml#common/only_cdh5_260" /> 
-->
+
+      <ul>
+        <li>
+          <p>
+            Improvements to Impala support for the Amazon S3 filesystem:
+          </p>
+          <ul>
+            <li>
+              <p rev="IMPALA-1878 CDH-33310">
+                Impala can now write to S3 tables through the 
<codeph>INSERT</codeph>
+                or <codeph>LOAD DATA</codeph> statements.
+                See <xref href="impala_s3.xml#s3"/> for general information 
about
+                using Impala with S3.
+              </p>
+            </li>
+            <li>
+              <p rev="IMPALA-3452 CDH-39913">
+                A new query option, <codeph>S3_SKIP_INSERT_STAGING</codeph>, 
lets you
+                trade off between fast <codeph>INSERT</codeph> performance and
+                slower <codeph>INSERT</codeph>s that are more consistent if a
+                problem occurs during the statement. The new behavior is 
enabled by default.
+                See <xref 
href="impala_s3_skip_insert_staging.xml#s3_skip_insert_staging"/> for details
+                about this option.
+              </p>
+            </li>
+          </ul>
+        </li>
+        <li>
+          <p rev="CDH-41184">
+            Performance improvements for the runtime filtering feature:
+          </p>
+          <ul>
+            <li>
+              <p rev="CDH-41184 IMPALA-3333">
+                The default for the <codeph>RUNTIME_FILTER_MODE</codeph> 
+                query option is changed to <codeph>GLOBAL</codeph> (the 
highest setting).
+                See <xref 
href="impala_runtime_filter_mode.xml#runtime_filter_mode"/> for
+                details about this option.
+              </p>
+            </li>
+            <li rev="CDH-41184 IMPALA-3007">
+              <p>
+                The <codeph>RUNTIME_BLOOM_FILTER_SIZE</codeph> setting is now 
only used
+                as a fallback if statistics are not available; otherwise, 
Impala
+                uses the statistics to estimate the appropriate size to use 
for each filter.
+                See <xref 
href="impala_runtime_bloom_filter_size.xml#runtime_bloom_filter_size"/> for
+                details about this option.
+              </p>
+            </li>
+            <li rev="CDH-41184 IMPALA-3480">
+              <p>
+                New query options <codeph>RUNTIME_FILTER_MIN_SIZE</codeph> and
+                <codeph>RUNTIME_FILTER_MAX_SIZE</codeph> let you fine-tune
+                the sizes of the Bloom filter structures used for runtime 
filtering.
+                If the filter size derived from Impala internal estimates or 
from
+                the <codeph>RUNTIME_FILTER_BLOOM_SIZE</codeph> falls outside 
the size
+                range specified by these options, any too-small filter size is 
adjusted
+                to the minimum, and any too-large filter size is adjusted to 
the maximum.
+                See <xref 
href="impala_runtime_filter_min_size.xml#runtime_filter_min_size"/>
+                and <xref 
href="impala_runtime_filter_max_size.xml#runtime_filter_max_size"/>
+                for details about these options.
+              </p>
+            </li>
+            <li rev="CDH-41184 IMPALA-2956">
+              <p>
+                Runtime filter propagation now applies to all the
+                operands of <codeph>UNION</codeph> and <codeph>UNION 
ALL</codeph>
+                operators.
+              </p>
+            </li>
+            <li rev="CDH-41184 IMPALA-3077">
+              <p>
+                Runtime filters can now be produced during join queries even 
+                when the join processing activates the spill-to-disk mechanism.
+              </p>
+            </li>
+          </ul>
+            See <xref href="impala_runtime_filtering.xml#runtime_filtering"/> 
for
+            general information about the runtime filtering feature.
+        </li>
+        <!-- Have to look closer at resource management / admission control to 
see if
+             there are any ripple effects from this default change. -->
+        <li>
+          <p rev="IMPALA-3199">
+            Admission control and dynamic resource pools are enabled by 
default.
+            See <xref href="impala_admission.xml#admission_control"/> for 
details
+            about admission control.
+          </p>
+        </li>
+        <!-- Below here are features that are pretty well taken care of 
already;
+             some of them didn't need much if any doc in the first place. -->
+        <li>
+          <p rev="IMPALA-3369">
+            Impala can now manually set column statistics,
+            using the <codeph>ALTER TABLE</codeph> statement with a
+            <codeph>SET COLUMN STATS</codeph> clause.
+            See <xref href="impala_perf_stats.xml#perf_column_stats_manual"/> 
for details.
+          </p>
+        </li>
+        <li>
+          <p rev="CDH-40238 CDH-39818 IMPALA-3490 IMPALA-3581 IMPALA-2686">
+            Impala can now write lightweight <q>minidump</q> files, rather
+            than large core files, to save diagnostic information when
+            any of the Impala-related daemons crash. This feature uses the
+            open source <codeph>breakpad</codeph> framework.
+            See <xref href="impala_breakpad.xml#breakpad"/> for details.
+          </p>
+        </li>
+        <li>
+          <p>
+            New query options improve interoperability with Parquet files:
+            <ul>
+              <li>
+                <p rev="IMPALA-2835 CDH-33330">
+                  The <codeph>PARQUET_FALLBACK_SCHEMA_RESOLUTION</codeph> 
query option
+                  lets Impala locate columns within Parquet files based on
+                  column name rather than ordinal position.
+                  This enhancement improves interoperability with applications
+                  that write Parquet files with a different order or subset of
+                  columns than are used in the Impala table.
+                  See <xref 
href="impala_parquet_fallback_schema_resolution.xml#parquet_fallback_schema_resolution"/>
+                  for details.
+                </p>
+              </li>
+              <li>
+                <p rev="IMPALA-2069">
+                  The <codeph>PARQUET_ANNOTATE_STRINGS_UTF8</codeph> query 
option
+                  makes Impala include the <codeph>UTF-8</codeph> annotation
+                  metadata for <codeph>STRING</codeph>, <codeph>CHAR</codeph>,
+                  and <codeph>VARCHAR</codeph> columns in Parquet files created
+                  by <codeph>INSERT</codeph> or <codeph>CREATE TABLE AS 
SELECT</codeph>
+                  statements.
+                  See <xref 
href="impala_parquet_annotate_strings_utf8.xml#parquet_annotate_strings_utf8"/>
+                  for details.
+                </p>
+              </li>
+            </ul>
+            See <xref href="impala_parquet.xml#parquet"/> for general 
information about working
+            with Parquet files.
+          </p>
+        </li>
+        <li>
+          <p>
+            Improvements to security and reduction in overhead for secure 
clusters:
+          </p>
+          <ul>
+            <li>
+              <p rev="IMPALA-1928">
+                Overall performance improvements for secure clusters.
+                (TPC-H queries on a secure cluster were benchmarked
+                at roughly 3x as fast as the previous release.)
+              </p>
+            </li>
+            <li>
+              <p rev="IMPALA-2660 CDH-40241">
+                Impala now recognizes the <codeph>auth_to_local</codeph> 
setting,
+                specified through the HDFS configuration setting
+                <codeph>hadoop.security.auth_to_local</codeph>.
+                This feature is disabled by default; to enable it,
+                specify <codeph>--load_auth_to_local_rules=true</codeph>
+                in the <cmdname>impalad</cmdname> configuration settings.
+                See <xref href="impala_kerberos.xml#auth_to_local"/> for 
details.
+              </p>
+            </li>
+            <li>
+              <p rev="IMPALA-2599">
+                Timing improvements in the mechanism for the 
<cmdname>impalad</cmdname>
+                daemon to acquire Kerberos tickets. This feature spreads out 
the overhead
+                on the KDC during Impala startup, especially for large 
clusters.
+              </p>
+            </li>
+            <li>
+              <p rev="IMPALA-3554">
+                For Kerberized clusters, the Catalog service now uses
+                the Kerberos principal instead of the operating sytem user 
that runs
+                the <cmdname>catalogd</cmdname> daemon. 
+                This eliminates the requirement to configure a 
<codeph>hadoop.user.group.static.mapping.overrides</codeph> 
+                setting to put the OS user into the Sentry administrative 
group, on clusters where the principal
+                and the OS user name for this user are different.
+              </p>
+            </li>
+          </ul>
+        </li>
+        <li>
+          <p rev="IMPALA-3286">
+            Overall performance improvements for join queries, by using a 
prefetching mechanism
+            while building the in-memory hash table to evaluate join 
predicates.
+            See <xref href="impala_prefetch_mode.xml#prefetch_mode"/> for the 
query option
+            to control this optimization.
+          </p>
+        </li>
+        <li>
+          <p rev="IMPALA-3397 CDH-40097">
+            The <cmdname>impala-shell</cmdname> interpreter has a new command,
+            <codeph>SOURCE</codeph>, that lets you run a set of SQL statements
+            or other <cmdname>impala-shell</cmdname> commands stored in a file.
+            You can run additional <codeph>SOURCE</codeph> commands from inside
+            a file, to set up flexible sequences of statements for use cases
+            such as schema setup, ETL, or reporting.
+            See <xref href="impala_shell_commands.xml#shell_commands"/> for 
details
+            and <xref 
href="impala_shell_running_commands.xml#shell_running_commands"/>
+            for examples.
+          </p>
+        </li>
+        <li>
+          <p rev="IMPALA-1772 CDH-38381">
+            The <codeph>millisecond()</codeph> built-in function lets you 
extract
+            the fractional seconds part of a <codeph>TIMESTAMP</codeph> value.
+            See <xref 
href="impala_datetime_functions.xml#datetime_functions"/> for details.
+          </p>
+        </li>
+        <li>
+          <p rev="IMPALA-3092">
+            If an Avro table is created without column definitions in the
+            <codeph>CREATE TABLE</codeph> statement, and columns are later
+            added through <codeph>ALTER TABLE</codeph>, the resulting
+            table is now queryable. Missing values from the newly added
+            columns now default to <codeph>NULL</codeph>.
+            See <xref href="impala_avro.xml#avro"/> for general details about
+            working with Avro files.
+          </p>
+        </li>
+        <li>
+          <p>
+            The mechanism for interpreting <codeph>DECIMAL</codeph> literals is
+            improved, no longer going through an intermediate conversion step
+            to <codeph>DOUBLE</codeph>:
+            <ul>
+              <li>
+                <p rev="IMPALA-3163">
+                  Casting a <codeph>DECIMAL</codeph> value to 
<codeph>TIMESTAMP</codeph>
+                  <codeph>DOUBLE</codeph> produces a more precise
+                  value for the <codeph>TIMESTAMP</codeph> than formerly.
+                </p>
+              </li>
+              <li>
+                <p rev="IMPALA-3439">
+                  Certain function calls involving <codeph>DECIMAL</codeph> 
literals
+                  now succeed, when formerly they failed due to lack of a 
function
+                  signature with a <codeph>DOUBLE</codeph> argument.
+                </p>
+              </li>
+              <li>
+                <p rev="">
+                  Faster runtime performance for <codeph>DECIMAL</codeph> 
constant
+                  values, through improved native code generation for all 
combinations
+                  of precision and scale.
+                </p>
+              </li>
+            </ul>
+            See <xref href="impala_decimal.xml#decimal"/> for details about 
the <codeph>DECIMAL</codeph> type.
+          </p>
+        </li>
+        <li>
+          <p rev="IMPALA-3155">
+            Improved type accuracy for <codeph>CASE</codeph> return values.
+            If all <codeph>WHEN</codeph> clauses of the <codeph>CASE</codeph>
+            expression are of <codeph>CHAR</codeph> type, the final result
+            is also <codeph>CHAR</codeph> instead of being converted to
+            <codeph>STRING</codeph>.
+            See <xref 
href="impala_conditional_functions.xml#conditional_functions"/>
+            for details about the <codeph>CASE</codeph> function.
+          </p>
+        </li>
+        <li>
+          <p rev="IMPALA-3232">
+            Uncorrelated queries using the <codeph>NOT EXISTS</codeph> operator
+            are now supported. Formerly, the <codeph>NOT EXISTS</codeph>
+            operator was only available for correlated subqueries.
+          </p>
+        </li>
+        <li>
+          <p rev="IMPALA-2736">
+            Improved performance for reading Parquet files.
+          </p>
+        </li>
+        <li>
+          <p rev="IMPALA-3375">
+            Improved performance for <term>top-N</term> queries, that is,
+            those including both <codeph>ORDER BY</codeph> and
+            <codeph>LIMIT</codeph> clauses.
+          </p>
+        </li>
+        <!-- JIRA still in open state as of 5.8 / 2.6, commenting out.
+        <li>
+          <p rev="IMPALA-3471">
+            A top-N query can now also activate the spill-to-disk mechanism if
+            a host runs low on memory while evaluating it. For example, using
+            large <codeph>LIMIT</codeph> and/or <codeph>OFFSET</codeph> clauses
+            adds some memory overhead that could cause spilling.
+          </p>
+        </li>
+        -->
+        <li>
+          <p rev="IMPALA-1740">
+            Impala optionally skips an arbitrary number of header lines from 
text input
+            files on HDFS based on the <codeph>skip.header.line.count</codeph> 
value
+            in the <codeph>TBLPROPERTIES</codeph> field of the table metadata.
+            See <xref href="impala_txtfile.xml#text_data_files"/> for details.
+          </p>
+        </li>
+        <li>
+          <p rev="IMPALA-2336">
+            Trailing comments are now allowed in queries processed by
+            the <cmdname>impala-shell</cmdname> options <codeph>-q</codeph>
+            and <codeph>-f</codeph>.
+          </p>
+        </li>
+        <li>
+          <p rev="IMPALA-2844">
+            Impala can run <codeph>COUNT</codeph> queries for RCFile tables
+            that include complex type columns.
+            See <xref href="impala_complex_types.xml#complex_types"/> for
+            general information about working with complex types,
+            and <xref href="impala_array.xml#array"/>,
+            <xref href="impala_map.xml#map"/>, and <xref 
href="impala_struct.xml#struct"/>
+            for syntax details of each type.
+          </p>
+        </li>
+      </ul>
+
+    </conbody>
+  </concept>
+
+<!-- All 2.5.x new features go under here -->
+
+  <concept rev="2.5.0" id="new_features_250">
+
+    <title>New Features in Impala 2.5.x / CDH 5.7.x</title>
+
+    <conbody>
+
+      <note conref="../shared/impala_common.xml#common/only_cdh5_250" />
+
+      <ul>
+        <li><!-- Spec: 
https://docs.google.com/document/d/1ambtYJ1t05iITCVIrN6N1A-e7PZBSetBPgjy8SLzJrA/edit#heading=h.vcftzwlpn845
 -->
+          <p rev="CDH-33292 IMPALA-2552 IMPALA-3054">
+            Dynamic partition pruning. When a query refers to a partition key 
column in a <codeph>WHERE</codeph>
+            clause, and the exact set of column values are not known until the 
query is executed,
+            Impala evaluates the predicate and skips the I/O for entire 
partitions that are not needed.
+            For example, if a table was partitioned by year, Impala would 
apply this technique to a query
+            such as <codeph>SELECT c1 FROM partitioned_table WHERE year = 
(SELECT MAX(year) FROM other_table)</codeph>.
+            <ph audience="standalone">See <xref 
href="impala_partitioning.xml#dynamic_partition_pruning"/> for details.</ph>
+          </p>
+          <p>
+            The dynamic partition pruning optimization technique lets Impala 
avoid reading
+            data files from partitions that are not part of the result set, 
even when
+            that determination cannot be made in advance. This technique is 
especially valuable
+            when performing join queries involving partitioned tables. For 
example, if a join
+            query includes an <codeph>ON</codeph> clause and a 
<codeph>WHERE</codeph> clause
+            that refer to the same columns, the query can find the set of 
column values that
+            match the <codeph>WHERE</codeph> clause, and only scan the 
associated partitions
+            when evaluating the <codeph>ON</codeph> clause.
+          </p>
+          <p>
+            Dynamic partition pruning is controlled by the same settings as 
the runtime filtering feature.
+            By default, this feature is enabled at a medium level, because the 
maximum setting can use
+            slightly more memory for queries than in previous releases.
+            To fully enable this feature, set the query option 
<codeph>RUNTIME_FILTER_MODE=GLOBAL</codeph>.
+          </p>
+        </li>
+        <li><!-- Spec: 
https://docs.google.com/document/d/1ambtYJ1t05iITCVIrN6N1A-e7PZBSetBPgjy8SLzJrA/edit#heading=h.vcftzwlpn845
 -->
+          <p rev="IMPALA-2419 IMPALA-3001 IMPALA-3008 IMPALA-3039 IMPALA-3046 
IMPALA-3054">
+            Runtime filtering. This is a wide-ranging set of optimizations 
that are especially valuable for join queries.
+            Using the same technique as with dynamic partition pruning,
+            Impala uses the predicates from <codeph>WHERE</codeph> and 
<codeph>ON</codeph> clauses
+            to determine the subset of column values from one of the joined 
tables could possibly be part of the
+            result set. Impala sends a compact representation of the filter 
condition to the hosts in the cluster,
+            instead of the full set of values or the entire table.
+            <ph audience="PDF">See <xref 
href="impala_runtime_filtering.xml#runtime_filtering"/> for details.</ph>
+          </p>
+          <p>
+            By default, this feature is enabled at a medium level, because the 
maximum setting can use
+            slightly more memory for queries than in previous releases.
+            To fully enable this feature, set the query option 
<codeph>RUNTIME_FILTER_MODE=GLOBAL</codeph>.
+            <ph audience="PDF">See <xref 
href="impala_runtime_filter_mode.xml#runtime_filter_mode"/> for details.</ph>
+          </p>
+          <p>
+            This feature involves some new query options:
+            <xref audience="standalone" 
href="impala_runtime_filter_mode.xml">RUNTIME_FILTER_MODE</xref><codeph 
audience="integrated">RUNTIME_FILTER_MODE</codeph>,
+            <xref audience="standalone" 
href="impala_max_num_runtime_filters.xml">MAX_NUM_RUNTIME_FILTERS</xref><codeph 
audience="integrated">MAX_NUM_RUNTIME_FILTERS</codeph>,
+            <xref audience="standalone" 
href="impala_runtime_bloom_filter_size.xml">RUNTIME_BLOOM_FILTER_SIZE</xref><codeph
 audience="integrated">RUNTIME_BLOOM_FILTER_SIZE</codeph>,
+            <xref audience="standalone" 
href="impala_runtime_filter_wait_time_ms.xml">RUNTIME_FILTER_WAIT_TIME_MS</xref><codeph
 audience="integrated">RUNTIME_FILTER_WAIT_TIME_MS</codeph>,
+            and <xref audience="standalone" 
href="impala_disable_row_runtime_filtering.xml">DISABLE_ROW_RUNTIME_FILTERING</xref><codeph
 audience="integrated">DISABLE_ROW_RUNTIME_FILTERING</codeph>.
+            <ph audience="PDF">See
+            <xref 
href="impala_runtime_filter_mode.xml#runtime_filter_mode">RUNTIME_FILTER_MODE</xref>,
+            <xref 
href="impala_max_num_runtime_filters.xml#max_num_runtime_filters">MAX_NUM_RUNTIME_FILTERS</xref>,
+            <xref 
href="impala_runtime_bloom_filter_size.xml#runtime_bloom_filter_size">RUNTIME_BLOOM_FILTER_SIZE</xref>,
+            <xref 
href="impala_runtime_filter_wait_time_ms.xml#runtime_filter_wait_time_ms">RUNTIME_FILTER_WAIT_TIME_MS</xref>,
 and
+            <xref 
href="impala_disable_row_runtime_filtering.xml#disable_row_runtime_filtering">DISABLE_ROW_RUNTIME_FILTERING</xref>
+            for details.
+            </ph>
+          </p>
+        </li>
+        <li>
+          <p rev="IMPALA-2696">
+            More efficient use of the HDFS caching feature, to avoid
+            hotspots and bottlenecks that could occur if heavily used
+            cached data blocks were always processed by the same host.
+            By default, Impala now randomizes which host processes each cached
+            HDFS data block, when cached replicas are available on multiple 
hosts.
+            (Remember to use the <codeph>WITH REPLICATION</codeph> clause with 
the
+            <codeph>CREATE TABLE</codeph> or <codeph>ALTER TABLE</codeph> 
statement
+            when enabling HDFS caching for a table or partition, to cache the 
same
+            data blocks across multiple hosts.)
+            The new query option <codeph>SCHEDULE_RANDOM_REPLICA</codeph>
+            <!-- and <codeph>REPLICA_PREFERENCE</codeph> -->
+            lets you fine-tune the interaction with HDFS caching even more.
+            <ph audience="PDF">See <xref 
href="impala_perf_hdfs_caching.xml#hdfs_caching"/> for details.</ph>
+          </p>
+        </li>
+        <li>
+          <p rev="IMPALA-2641">
+            The <codeph>TRUNCATE TABLE</codeph> statement now accepts an 
<codeph>IF EXISTS</codeph>
+            clause, making <codeph>TRUNCATE TABLE</codeph> easier to use in 
setup or ETL scripts where the table might or
+            might not exist.
+            <ph audience="PDF">See <xref 
href="impala_truncate_table.xml#truncate_table"/> for details.</ph>
+          </p>
+        </li>
+        <li>
+          <p rev="IMPALA-2681 IMPALA-2688 IMPALA-2749">
+            Improved performance and reliability for the 
<codeph>DECIMAL</codeph> data type:
+            <ul>
+            <li>
+              <p rev="IMPALA-2681">
+                Using <codeph>DECIMAL</codeph> values in a <codeph>GROUP 
BY</codeph> clause now
+                triggers the native code generation optimization, speeding up 
queries that
+                group by values such as prices.
+              </p>
+            </li>
+            <li>
+              <p rev="IMPALA-2688">
+                Checking for overflow in <codeph>DECIMAL</codeph>
+                multiplication is now substantially faster, making 
<codeph>DECIMAL</codeph>
+                a more practical data type in some use cases where formerly 
<codeph>DECIMAL</codeph>
+                was much slower than <codeph>FLOAT</codeph> or 
<codeph>DOUBLE</codeph>.
+              </p>
+            </li>
+            <li>
+              <p rev="IMPALA-2749">
+                Multiplying a mixture of <codeph>DECIMAL</codeph>
+                and <codeph>FLOAT</codeph> or <codeph>DOUBLE</codeph> values 
now returns the
+                <codeph>DOUBLE</codeph> rather than <codeph>DECIMAL</codeph>. 
This change avoids
+                some cases where an intermediate value would underflow or 
overflow and become
+                <codeph>NULL</codeph> unexpectedly.
+              </p>
+            </li>
+            </ul>
+            <ph audience="PDF">See <xref href="impala_decimal.xml"/> for 
details.</ph>
+          </p>
+        </li>
+        <li>
+          <p rev="IMPALA-2382">
+            For UDFs written in Java, or Hive UDFs reused for Impala,
+            Impala now allows parameters and return values to be primitive 
types.
+            Formerly, these things were required to be one of the 
<q>Writable</q>
+            object types.
+            <ph audience="PDF">See <xref href="impala_udf.xml#udfs_hive"/> for 
details.</ph>
+          </p>
+        </li>
+        <!-- CDH-33298 is for scoping internationalization / UTF-8 / Unicode 
support. That work is pushed out to 5.8.
+        <li>
+          <p rev="CDH-33298">
+            Improvements to internationalization support.
+            Now Impala can process data that uses the UTF-8 character encoding.
+          </p>
+        </li>
+        -->
+        <li>
+          <p rev="IMPALA-1588"><!-- This is from 2015, so perhaps it's really 
in an earlier release. -->
+            Performance improvements for HDFS I/O. Impala now caches HDFS file 
handles to avoid the
+            overhead of repeatedly opening the same file.
+          </p>
+        </li>
+
+        <!-- Kudu didn't make it into 2.5 / 5.7 release, so no DELETE or 
UPDATE statement. -->
+        <li>
+          <p><!-- Is there a JIRA for that one? Alex? -->
+            Performance improvements for queries involving nested complex 
types.
+            Certain basic query types, such as counting the elements of a 
complex column,
+            now use an optimized code path.
+          </p>
+        </li>
+        <li>
+          <p rev="IMPALA-3044 IMPALA-2538 IMPALA-1168 CDH-33289 CDH-34603">
+            Improvements to the memory reservation mechanism for the Impala
+            admission control feature. You can specify more settings, such
+            as the timeout period and maximum aggregate memory used, for each
+            resource pool instead of globally for the Impala instance. The
+            default limit for concurrent queries (the <uicontrol>max 
requests</uicontrol>
+            setting) is now unlimited instead of 200.
+            The Cloudera Manager user interface for admission control has been
+            reworked, with the settings available under the
+            <uicontrol>Dynamic Resource Pools</uicontrol> window.
+          </p>
+        </li>
+        <li>
+          <p rev="IMPALA-1755">
+            Performance improvements related to code generation.
+            Even in queries where code generation is not performed
+            for some phases of execution (such as reading data from
+            Parquet tables), Impala can still use code generation in
+            other parts of the query, such as evaluating
+            functions in the <codeph>WHERE</codeph> clause.
+          </p>
+        </li>
+        <li>
+          <p rev="IMPALA-1305">
+            Performance improvements for queries using aggregation functions
+            on high-cardinality columns.
+            Formerly, Impala could do unnecessary extra work to produce 
intermediate
+            results for operations such as <codeph>DISTINCT</codeph> or 
<codeph>GROUP BY</codeph>
+            on columns that were unique or had few duplicate values.
+            Now, Impala decides at run time whether it is more efficient to
+            do an initial aggregation phase and pass along a smaller set of 
intermediate data,
+            or to pass raw intermediate data back to next phase of query 
processing to be aggregated there.
+            This feature is known as <term>streaming pre-aggregation</term>.
+            In case of performance regression, this feature can be turned off
+            using the <codeph>DISABLE_STREAMING_PREAGGREGATIONS</codeph> query 
option.
+            <ph audience="PDF">See <xref 
href="impala_disable_streaming_preaggregations.xml#disable_streaming_preaggregations"/>
 for details.</ph>
+          </p>
+        </li>
+        <li>
+          <p>
+            Spill-to-disk feature now always recommended. In earlier releases, 
the spill-to-disk feature
+            could be turned off using a pair of configuration settings,
+            <codeph>enable_partitioned_aggregation=false</codeph> and
+            <codeph>enable_partitioned_hash_join=false</codeph>.
+            The latest improvements in the spill-to-disk mechanism, and 
related features that
+            interact with it, make this feature robust enough that disabling 
it is now
+            no longer needed or supported. In particular, some new features in 
<keyword keyref="impala25_full"/>
+            and higher do not work when the spill-to-disk feature is disabled.
+          </p>
+        </li>
+        <li>
+          <p rev="IMPALA-1067">
+            Improvements to scripting capability for the 
<cmdname>impala-shell</cmdname> command,
+            through user-specified substitution variables that can appear in 
statements processed
+            by <cmdname>impala-shell</cmdname>:
+          </p>
+          <ul>
+            <li rev="IMPALA-2179">
+              <p>
+                The <codeph>--var</codeph> command-line option lets you pass 
key-value pairs to
+                <cmdname>impala-shell</cmdname>. The shell can substitute the 
values
+                into queries before executing them, where the query text 
contains the notation
+                <codeph>${var:<varname>varname</varname>}</codeph>. For 
example, you might prepare a SQL file
+                containing a set of DDL statements and queries containing 
variables for
+                database and table names, and then pass the applicable names 
as part of the
+                <codeph>impala-shell -f <varname>filename</varname></codeph> 
command.
+                <ph audience="PDF">See <xref 
href="impala_shell_running_commands.xml#shell_running_commands"/> for 
details.</ph>
+              </p>
+            </li>
+            <li rev="IMPALA-2180">
+              <p>
+                The <codeph>SET</codeph> and <codeph>UNSET</codeph> commands 
within the
+                <cmdname>impala-shell</cmdname> interpreter now work with 
user-specified
+                substitution variables, as well as the built-in query options.
+                The two kinds of variables are divided in the 
<codeph>SET</codeph> output.
+                As with variables defined by the <codeph>--var</codeph> 
command-line option,
+                you refer to the user-specified substitution variables in 
queries by using
+                the notation <codeph>${var:<varname>varname</varname>}</codeph>
+                in the query text. Because the substitution variables are 
processed by
+                <cmdname>impala-shell</cmdname> instead of the 
<cmdname>impalad</cmdname>
+                backend, you cannot define your own substitution variables 
through the
+                <codeph>SET</codeph> statement in a JDBC or ODBC application.
+                <ph audience="PDF">See <xref href="impala_set.xml#set"/> for 
details.</ph>
+              </p>
+            </li>
+          </ul>
+        </li>
+        <li>
+          <p rev="IMPALA-1599">
+            Performance improvements for query startup. Impala better 
parallelizes certain work
+            when coordinating plan distribution between 
<cmdname>impalad</cmdname> instances, which improves
+            startup time for queries involving tables with many partitions on 
large clusters,
+            or complicated queries with many plan fragments.
+          </p>
+        </li>
+        <li>
+          <p rev="IMPALA-2560">
+            Performance and scalability improvements for tables with many 
partitions.
+            The memory requirements on the coordinator node are reduced, 
making it substantially
+            faster and less resource-intensive
+            to do joins involving several tables with thousands of partitions 
each.
+          </p>
+        </li>
+        <li>
+          <p rev="IMPALA-3095">
+            Whitelisting for access to internal APIs. For applications that 
need direct access
+            to Impala APIs, without going through the HiveServer2 or Beeswax 
interfaces, you can
+            specify a list of Kerberos users who are allowed to call those 
APIs. By default, the
+            <codeph>impala</codeph> and <codeph>hdfs</codeph> users are the 
only ones authorized
+            for this kind of access.
+            Any users not explicitly authorized through the 
<codeph>internal_principals_whitelist</codeph>
+            configuration setting are blocked from accessing the APIs. This 
setting applies to all the
+            Impala-related daemons, although currently it is primarily used 
for HDFS to control the
+            behavior of the catalog server.
+          </p>
+        </li>
+        <li>
+          <p rev="CDH-37009 CDH-30378">
+            Improvements to Impala integration and usability for Hue. (The 
code changes
+            are actually on the Hue side.)
+          </p>
+          <ul>
+          <li>
+            <p rev="CDH-37009">
+              The list of tables now refreshes dynamically.
+            </p>
+          </li>
+          </ul>
+        </li>
+        <li>
+          <p rev="IMPALA-1787">
+            Usability improvements for case-insensitive queries.
+            You can now use the operators <codeph>ILIKE</codeph> and 
<codeph>IREGEXP</codeph>
+            to perform case-insensitive wildcard matches or regular expression 
matches,
+            rather than explicitly converting column values with 
<codeph>UPPER</codeph>
+            or <codeph>LOWER</codeph>.
+            <ph audience="PDF">See <xref href="impala_operators.xml#ilike"/> 
and <xref href="impala_operators.xml#iregexp"/> for details.</ph>
+          </p>
+        </li>
+        <li>
+          <p rev="IMPALA-1480">
+            Performance and reliability improvements for DDL and insert 
operations on partitioned tables with a large
+            number of partitions. Impala only re-evaluates metadata for 
partitions that are affected by
+            a DDL operation, not all partitions in the table. While a DDL or 
insert statement is in progress,
+            other Impala statements that attempt to modify metadata for the 
same table wait until the first one
+            finishes.
+          </p>
+        </li>
+        <li>
+          <p rev="IMPALA-2867">
+            Reliability improvements for the <codeph>LOAD DATA</codeph> 
statement.
+            Previously, this statement would fail if the source HDFS directory
+            contained any subdirectories at all. Now, the statement ignores
+            any hidden subdirectories, for example 
<filepath>_impala_insert_staging</filepath>.
+          </p>
+        </li>
+        <li>
+          <p rev="IMPALA-2147">
+            A new operator, <codeph>IS [NOT] DISTINCT FROM</codeph>, lets you 
compare values
+            and always get a <codeph>true</codeph> or <codeph>false</codeph> 
result,
+            even if one or both of the values are <codeph>NULL</codeph>.
+            The <codeph>IS NOT DISTINCT FROM</codeph> operator, or its 
equivalent
+            <codeph>&lt;=&gt;</codeph> notation, improves the efficiency of 
join queries that
+            treat key values that are <codeph>NULL</codeph> in both tables as 
equal.
+            <ph audience="PDF">See <xref 
href="impala_operators.xml#is_distinct_from"/> for details.</ph>
+          </p>
+        </li>
+        <li>
+          <p rev="IMPALA-1934">
+            Security enhancements for the <cmdname>impala-shell</cmdname> 
command.
+            A new option, <codeph>--ldap_password_cmd</codeph>, lets you 
specify
+            a command to retrieve the LDAP password. The resulting password is
+            then used to authenticate the <cmdname>impala-shell</cmdname> 
command
+            with the LDAP server.
+            <ph audience="PDF">See <xref href="impala_shell_options.xml"/> for 
details.</ph>
+          </p>
+        </li>
+        <li>
+          <p>
+            The <codeph>CREATE TABLE AS SELECT</codeph> statement now accepts a
+            <codeph>PARTITIONED BY</codeph> clause, which lets you create a
+            partitioned table and insert data into it with a single statement.
+            <ph audience="PDF">See <xref 
href="impala_create_table.xml#create_table"/> for details.</ph>
+          </p>
+        </li>
+        <li>
+          <p rev="IMPALA-1748 CDH-38369">
+            User-defined functions (UDFs and UDAFs) written in C++ now persist 
automatically
+            when the <cmdname>catalogd</cmdname> daemon is restarted. You no 
longer
+            have to run the <codeph>CREATE FUNCTION</codeph> statements again 
after a restart.
+          </p>
+        </li>
+        <li>
+          <p rev="IMPALA-2843 CDH-39148">
+            User-defined functions (UDFs) written in Java can now persist
+            when the <cmdname>catalogd</cmdname> daemon is restarted, and can 
be shared
+            transparently between Impala and Hive. You must do a one-time 
operation to recreate these
+            UDFs using new <codeph>CREATE FUNCTION</codeph> syntax, without a 
signature for arguments
+            or the return value. Afterwards, you no longer have to run the 
<codeph>CREATE FUNCTION</codeph>
+            statements again after a restart.
+            Although Impala does not have visibility into the UDFs that 
implement the 
+            Hive built-in functions, user-created Hive UDFs are now 
automatically available
+            for calling through Impala.
+            <ph audience="PDF">See <xref 
href="impala_create_function.xml#create_function"/> for details.</ph>
+          </p>
+        </li>
+        <li>
+          <!-- Listed as fixed in 2.6.0. Is this item inappropriate or did it 
actually come from a different JIRA? -->
+          <p rev="IMPALA-2728">
+            Reliability enhancements for memory management. Some aggregation 
and join queries
+            that formerly might have failed with an out-of-memory error due to 
memory contention,
+            now can succeed using the spill-to-disk mechanism.
+          </p>
+        </li>
+        <li>
+          <!-- Same blurb is under Incompatible Changes. Turn into a conref. 
-->
+          <p rev="IMPALA-2070">
+            The <codeph>SHOW DATABASES</codeph> statement now returns two 
columns rather than one.
+            The second column includes the associated comment string, if any, 
for each database.
+            Adjust any application code that examines the list of databases 
and assumes the
+            result set contains only a single column.
+            <ph audience="PDF">See <xref 
href="impala_show.xml#show_databases"/> for details.</ph>
+          </p>
+        </li>
+        <li>
+          <p rev="IMPALA-2499">
+            A new optimization speeds up aggregation operations that involve 
only the partition key
+            columns of partitioned tables. For example, a query such as 
<codeph>SELECT COUNT(DISTINCT k), MIN(k), MAX(k) FROM t1</codeph>
+            can avoid reading any data files if <codeph>T1</codeph> is a 
partitioned table and <codeph>K</codeph>
+            is one of the partition key columns. Because this technique can 
produce different results in cases
+            where HDFS files in a partition are manually deleted or are empty, 
you must enable the optimization
+            by setting the query option 
<codeph>OPTIMIZE_PARTITION_KEY_SCANS</codeph>.
+            <ph audience="PDF">See <xref 
href="impala_optimize_partition_key_scans.xml"/> for details.</ph>
+          </p>
+        </li>
+        <li audience="Cloudera"><!-- All the other undocumented query options 
are not really new features for this release, so hiding this whole bullet. -->
+          <p>
+            Other new query options:
+          </p>
+          <ul>
+            <li audience="Cloudera"><!-- Actually from a long way back, just 
never documented. Not sure if appropriate to keep internal-only or expose. -->
+              <codeph>DISABLE_OUTERMOST_TOPN</codeph>
+            </li>
+            <li audience="Cloudera"><!-- Actually from a long way back, just 
never documented. Not sure if appropriate to keep internal-only or expose. -->
+              <codeph>RM_INITIAL_MEM</codeph>
+            </li>
+            <li audience="Cloudera"><!-- Seems to be related to writing 
sequence files, a capability not externalized at this time. -->
+              <codeph>SEQ_COMPRESSION_MODE</codeph>
+            </li>
+            <li audience="Cloudera"><!-- Actually, was only used for working 
around one JIRA. Being deprecated now in Impala 2.3 via IMPALA-2963. -->
+              <codeph>DISABLE_CACHED_READS</codeph>
+            </li>
+          </ul>
+        </li>
+        <li>
+          <p rev="IMPALA-2196">
+            The <codeph>DESCRIBE</codeph> statement can now display metadata 
about a database, using the
+            syntax <codeph>DESCRIBE DATABASE 
<varname>db_name</varname></codeph>.
+            <ph audience="PDF">See <xref href="impala_describe.xml#describe"/> 
for details.</ph>
+          </p>
+        </li>
+        <li>
+          <p rev="IMPALA-1477">
+            The <codeph>uuid()</codeph> built-in function generates an
+            alphanumeric value that you can use as a guaranteed unique 
identifier.
+            The uniqueness applies even across tables, for cases where an 
ascending
+            numeric sequence is not suitable.
+            <ph audience="PDF">See <xref 
href="impala_misc_functions.xml#misc_functions"/> for details.</ph>
+          </p>
+        </li>
+      </ul>
+
+    </conbody>
+  </concept>
+
+<!-- All 2.4.x new features go under here -->
+
+  <concept rev="2.4.0" id="new_features_240">
+
+    <title>New Features in Impala 2.4.x / CDH 5.6.x</title>
+
+    <conbody>
+
+      <note conref="../shared/impala_common.xml#common/only_cdh5_240" />
+
+      <ul>
+        <li>
+          <p>
+            Impala can be used on the DSSD D5 Storage Appliance.
+            From a user perspective, the Impala features are the same as in 
CDH 5.5 / Impala 2.3.
+          </p>
+        </li>
+      </ul>
+
+    </conbody>
+  </concept>
+
+<!-- All 2.3.x subsections go under here -->
+
+<!-- Actually for 2.3 / 5.5, let's get away from doing a separate subhead for 
each maintenance release,
+     because in the normal course of events there will be nothing to add here 
until 5.6. If something new
+     needs to get noted, just add a new bullet with wording to indicate which 
5.5.x release it applies to. -->
+
+  <concept rev="2.3.0" id="new_features_230">
+
+    <title>New Features in Impala 2.3.x / CDH 5.5.x</title>
+
+    <conbody>
+
+      <note conref="../shared/impala_common.xml#common/only_cdh5_23x" />
+
+      <p>
+        The following are the major new features in Impala 2.3.x. This major 
release, available as part of CDH
+        5.5.x, contains improvements to SQL syntax (particularly new support 
for complex types), performance,
+        manageability, security.
+      </p>
+
+      <ul>
+
+        <li>
+          <p>
+            Complex data types: <codeph>STRUCT</codeph>, 
<codeph>ARRAY</codeph>, and <codeph>MAP</codeph>. These
+            types can encode multiple named fields, positional items, or 
key-value pairs within a single column.
+            You can combine these types to produce nested types with 
arbitrarily deep nesting,
+            such as an <codeph>ARRAY</codeph> of <codeph>STRUCT</codeph> 
values,
+            a <codeph>MAP</codeph> where each key-value pair is an 
<codeph>ARRAY</codeph> of other <codeph>MAP</codeph> values,
+            and so on. Currently, complex data types are only supported for 
the Parquet file format.
+            <ph audience="PDF">See <xref 
href="impala_complex_types.xml#complex_types"/> for usage details and <xref 
href="impala_array.xml#array"/>, <xref href="impala_struct.xml#struct"/>, and 
<xref href="impala_map.xml#map"/> for syntax.</ph>
+          </p>
+        </li>
+
+        <li rev="collevelauth">
+          <p>
+            Column-level authorization lets you define access to particular 
columns within a table,
+            rather than the entire table. This feature lets you reduce the 
reliance on creating views to
+            set up authorization schemes for subsets of information.
+            <ph audience="integrated">See <xref 
href="sg_hive_sql.xml#concept_c2q_4qx_p4/col_level_auth_sentry"/> for 
background details, and <xref href="impala_grant.xml#grant"/> and <xref 
href="impala_revoke.xml#revoke"/> for Impala-specific syntax.</ph>
+          </p>
+        </li>
+
+        <li rev="IMPALA-1139">
+          <p>
+            The <codeph>TRUNCATE TABLE</codeph> statement removes all the data 
from a table without removing the table itself.
+            <ph audience="PDF">See <xref 
href="impala_truncate_table.xml#truncate_table"/> for details.</ph>
+          </p>
+        </li>
+
+        <li id="IMPALA-2015">
+          <p>
+            Nested loop join queries. Some join queries that formerly required 
equality comparisons can now use
+            operators such as <codeph>&lt;</codeph> or <codeph>&gt;=</codeph>. 
This same join mechanism is used
+            internally to optimize queries that retrieve values from complex 
type columns.
+            <ph audience="PDF">See <xref href="impala_joins.xml#joins"/> for 
details about Impala join queries.</ph>
+          </p>
+        </li>
+
+        <li id="CDH-28141">
+          <p>
+            Reduced memory usage and improved performance and robustness for 
spill-to-disk feature.
+            <ph audience="PDF">See <xref 
href="impala_scalability.xml#spill_to_disk"/> for details about this 
feature.</ph>
+          </p>
+        </li>
+
+        <li rev="IMPALA-1881 CDH-34620">
+          <p>
+            Performance improvements for querying Parquet data files 
containing multiple row groups
+            and multiple data blocks:
+          </p>
+          <ul>
+          <li>
+          <p> For files written by Hive, SparkSQL, and other Parquet MR writers
+                and spanning multiple HDFS blocks, Impala now scans the extra
+                data blocks locally when possible, rather than using remote
+                reads. </p>
+          </li>
+          <li>
+          <p>
+            Impala queries benefit from the improved alignment of row groups 
with HDFS blocks for Parquet
+            files written by Hive, MapReduce, and other components in <ph 
rev="upstream">CDH 5.5</ph> and higher. (Impala itself never writes
+            multiblock Parquet files, so the alignment change does not apply 
to Parquet files produced by Impala.)
+            These Parquet writers now add padding to Parquet files that they 
write to align row groups with HDFS blocks.
+            The <codeph>parquet.writer.max-padding</codeph> setting specifies 
the maximum number of bytes, by default
+            8 megabytes, that can be added to the file between row groups to 
fill the gap at the end of one block
+            so that the next row group starts at the beginning of the next 
block.
+            If the gap is larger than this size, the writer attempts to fit 
another entire row group in the remaining space.
+            Include this setting in the <filepath>hive-site</filepath> 
configuration file to influence Parquet files written by Hive,
+            or the <filepath>hdfs-site</filepath> configuration file to 
influence Parquet files written by all non-Impala components.
+          </p>
+          </li>
+          </ul>
+          <p audience="PDF">
+            See <xref href="impala_parquet.xml#parquet"/> for instructions 
about using Parquet data files
+            with Impala, and
+            <xref audience="integrated" 
href="cdh_ig_parquet.xml#parquet_format"/><xref audience="standalone" 
href="http://www.cloudera.com/documentation/enterprise/latest/topics/cdh_ig_parquet.html";
 scope="external" format="html"/>
+            for instructions for
+            other components that can read and write Parquet files.
+          </p>
+        </li>
+
+        <li id="IMPALA-1660">
+          <p>
+            Many new built-in scalar functions, for convenience and enhanced 
portability of SQL that uses common industry extensions.
+          </p>
+
+          <p rev="IMPALA-1771">
+            Math functions<ph audience="PDF"> (see <xref 
href="impala_math_functions.xml#math_functions"/> for details)</ph>:
+          </p>
+          <ul>
+            <li>
+              <codeph>ATAN2</codeph>
+            </li>
+
+            <li>
+              <codeph>COSH</codeph>
+            </li>
+
+            <li>
+              <codeph>COT</codeph>
+            </li>
+
+            <li>
+              <codeph>DCEIL</codeph>
+            </li>
+
+            <li>
+              <codeph>DEXP</codeph>
+            </li>
+
+            <li>
+              <codeph>DFLOOR</codeph>
+            </li>
+
+            <li>
+              <codeph>DLOG10</codeph>
+            </li>
+
+            <li>
+              <codeph>DPOW</codeph>
+            </li>
+
+            <li>
+              <codeph>DROUND</codeph>
+            </li>
+
+            <li>
+              <codeph>DSQRT</codeph>
+            </li>
+
+            <li>
+              <codeph>DTRUNC</codeph>
+            </li>
+
+            <li>
+              <codeph>FACTORIAL</codeph>, and corresponding <codeph>!</codeph> 
operator
+            </li>
+
+            <li>
+              <codeph>FPOW</codeph>
+            </li>
+
+            <li>
+              <codeph>RADIANS</codeph>
+            </li>
+
+            <li>
+              <codeph>RANDOM</codeph>
+            </li>
+
+            <li>
+              <codeph>SINH</codeph>
+            </li>
+
+            <li>
+              <codeph>TANH</codeph>
+            </li>
+          </ul>
+
+          <p>
+            String functions<ph audience="PDF"> (see <xref 
href="impala_string_functions.xml#string_functions"/> for details)</ph>:
+          </p>
+          <ul>
+            <li>
+              <codeph>BTRIM</codeph>
+            </li>
+            <li>
+              <codeph>CHR</codeph>
+            </li>
+            <li>
+              <codeph>REGEXP_LIKE</codeph>
+            </li>
+            <li>
+              <codeph>SPLIT_PART</codeph>
+            </li>
+          </ul>
+
+          <p>
+            Date and time functions<ph audience="PDF"> (see <xref 
href="impala_datetime_functions.xml#datetime_functions"/> for details)</ph>:
+          </p>
+          <ul>
+              <li>
+                <codeph>INT_MONTHS_BETWEEN</codeph>
+              </li>
+              <li>
+                <codeph>MONTHS_BETWEEN</codeph>
+              </li>
+              <li>
+                <codeph>TIMEOFDAY</codeph>
+              </li>
+              <li>
+                <codeph>TIMESTAMP_CMP</codeph>
+              </li>
+          </ul>
+
+          <p>
+            Bit manipulation functions<ph audience="PDF"> (see <xref 
href="impala_bit_functions.xml#bit_functions"/> for details)</ph>:
+          </p>
+          <ul>
+            <li>
+              <codeph>BITAND</codeph>
+            </li>
+
+            <li>
+              <codeph>BITNOT</codeph>
+            </li>
+
+            <li>
+              <codeph>BITOR</codeph>
+            </li>
+
+            <li>
+              <codeph>BITXOR</codeph>
+            </li>
+
+            <li>
+              <codeph>COUNTSET</codeph>
+            </li>
+
+            <li>
+              <codeph>GETBIT</codeph>
+            </li>
+
+            <li>
+              <codeph>ROTATELEFT</codeph>
+            </li>
+
+            <li>
+              <codeph>ROTATERIGHT</codeph>
+            </li>
+
+            <li>
+              <codeph>SETBIT</codeph>
+            </li>
+
+            <li>
+              <codeph>SHIFTLEFT</codeph>
+            </li>
+
+            <li>
+              <codeph>SHIFTRIGHT</codeph>
+            </li>
+          </ul>
+          <p>
+            Type conversion functions<ph audience="PDF"> (see <xref 
href="impala_conversion_functions.xml#conversion_functions"/> for details)</ph>:
+          </p>
+          <ul>
+            <li>
+              <codeph>TYPEOF</codeph>
+            </li>
+          </ul>
+          <p>
+            The <codeph>effective_user()</codeph> function<ph audience="PDF"> 
(see <xref href="impala_misc_functions.xml#misc_functions"/> for details)</ph>.
+          </p>
+        </li>
+
+        <li id="IMPALA-2081">
+          <p>
+            New built-in analytic functions: <codeph>PERCENT_RANK</codeph>, 
<codeph>NTILE</codeph>,
+            <codeph>CUME_DIST</codeph>.
+            <ph audience="PDF">See <xref 
href="impala_analytic_functions.xml#analytic_functions"/> for details.</ph>
+          </p>
+        </li>
+
+        <li id="IMPALA-595">
+          <p>
+            The <codeph>DROP DATABASE</codeph> statement now works for a 
non-empty database.
+            When you specify the optional <codeph>CASCADE</codeph> clause, any 
tables in the
+            database are dropped before the database itself is removed.
+            <ph audience="PDF">See <xref 
href="impala_drop_database.xml#drop_database"/> for details.</ph>
+          </p>
+        </li>
+
+        <li>
+          <p>
+            The <codeph>DROP TABLE</codeph> and <codeph>ALTER TABLE DROP 
PARTITION</codeph> statements have a new optional keyword, 
<codeph>PURGE</codeph>.
+            This keyword causes Impala to immediately remove the relevant HDFS 
data files rather than sending them to the HDFS trashcan.
+            This feature can help to avoid out-of-space errors on storage 
devices, and to avoid files being left behind in case of
+            a problem with the HDFS trashcan, such as the trashcan not being 
configured or being in a different HDFS encryption zone
+            than the data files.
+            <ph audience="PDF">See <xref 
href="impala_drop_table.xml#drop_table"/> and <xref 
href="impala_alter_table.xml#alter_table"/> for syntax.</ph>
+          </p>
+        </li>
+
+        <li id="IMPALA-80">
+          <p>
+            The <cmdname>impala-shell</cmdname> command has a new feature for 
live progress reporting. This feature
+            is enabled through the <codeph>--live_progress</codeph> and 
<codeph>--live_summary</codeph>
+            command-line options, or during a session through the 
<codeph>LIVE_SUMMARY</codeph> and
+            <codeph>LIVE_PROGRESS</codeph> query options.
+            <ph audience="PDF">See <xref 
href="impala_live_progress.xml#live_progress"/> and <xref 
href="impala_live_summary.xml#live_summary"/> for details.</ph>
+          </p>
+        </li>
+
+        <li>
+          <p>
+            The <cmdname>impala-shell</cmdname> command also now displays a 
random <q>tip of the day</q> when it starts.
+          </p>
+        </li>
+
+        <li id="IMPALA-1413">
+          <p>
+            The <cmdname>impala-shell</cmdname> option <codeph>-f</codeph> now 
recognizes a special filename
+            <codeph>-</codeph> to accept input from stdin.
+            <ph audience="PDF">See <xref 
href="impala_shell_options.xml#shell_options"/> for details about the options 
for running <cmdname>impala-shell</cmdname> in non-interactive mode.</ph>
+          </p>
+        </li>
+
+        <li id="IMPALA-1963">
+          <p>
+            Format strings for the <codeph>unix_timestamp()</codeph> function 
can now include numeric timezone offsets.
+            <ph audience="PDF">See <xref 
href="impala_datetime_functions.xml#datetime_functions"/> for details.</ph>
+          </p>
+        </li>
+
+        <li id="CDH-27547">
+          <p>
+            Impala can now run a specified command to obtain the password to 
decrypt a private-key PEM file,
+            rather than having the private-key file be unencrypted on disk.
+            <ph audience="PDF">See <xref href="impala_ssl.xml#ssl"/> for 
details.</ph>
+          </p>
+        </li>
+
+        <li id="IMPALA-859">
+          <p>
+            Impala components now can use SSL for more of their internal 
communication. SSL is used for
+            communication between all three Impala-related daemons when the 
configuration option
+            <codeph>ssl_server_certificate</codeph> is enabled. SSL is used 
for communication with client
+            applications when the configuration option 
<codeph>ssl_client_ca_certificate</codeph> is enabled.
+            <ph audience="PDF">See <xref href="impala_ssl.xml#ssl"/> for 
details.</ph>
+          </p>
+          <p>
+            Currently, you can only use one of server-to-server TLS/SSL 
encryption or Kerberos authentication.
+            This limitation is tracked by the issue
+            <xref href="https://issues.cloudera.org/browse/IMPALA-2598"; 
scope="external" format="html">IMPALA-2598</xref>.
+          </p>
+        </li>
+
+        <li id="IMPALA-1829">
+          <p>
+            Improved flexibility for intermediate data types in user-defined 
aggregate functions (UDAFs).
+            <ph audience="PDF">See <xref href="impala_udf.xml#udafs"/> for 
details.</ph>
+          </p>
+        </li>
+
+      </ul>
+
+      <p>
+        In CDH 5.5.2 / Impala 2.3.2, the bug fix for <xref 
href="https://issues.cloudera.org/browse/IMPALA-2598"; scope="external" 
format="html">IMPALA-2598</xref>
+        removes the restriction on using both Kerberos and SSL for internal 
communication between Impala components.
+      </p>
+
+<!-- End of new feature list for 2.3 / 5.5. -->
+
+    </conbody>
+
+  </concept>
+
+<!-- All 2.2.x subsections go under here -->
+
+<!-- Removing all the 5.4.x release subtopics for which there wasn't anything 
to say.
+     Same convention as used in 5.5.x, 5.6.x, 5.7.x, 5.8.x. Only have one 
subtopic for
+     the .0.
+  <concept rev="5.4.10" id="new_features_2210">
+
+    <title>New Features in Impala 2.2.10 / CDH 5.4.10</title>
+
+    <conbody>
+
+      <p>
+        No new features. This point release is exclusively a bug fix release.
+      </p>
+
+      <note conref="../shared/impala_common.xml#common/only_cdh5_22x" />
+
+    </conbody>
+
+  </concept>
+
+  <concept rev="5.4.9" id="new_features_229">
+
+    <title>New Features in Impala 2.2.9 / CDH 5.4.9</title>
+
+    <conbody>
+
+      <p>
+        No new features. This point release is exclusively a bug fix release.
+      </p>
+
+      <note conref="../shared/impala_common.xml#common/only_cdh5_22x" />
+
+    </conbody>
+
+  </concept>
+
+  <concept rev="5.4.8" id="new_features_228">
+
+    <title>New Features in Impala 2.2.8 / CDH 5.4.8</title>
+
+    <conbody>
+
+      <p>
+        No new features. This point release is exclusively a bug fix release.
+      </p>
+
+      <note conref="../shared/impala_common.xml#common/only_cdh5_22x" />
+
+    </conbody>
+
+  </concept>
+
+  <concept rev="5.4.7" id="new_features_227">
+
+    <title>New Features in Impala 2.2.7 / CDH 5.4.7</title>
+
+    <conbody>
+
+      <p>
+        No new features. This point release is exclusively a bug fix release.
+      </p>
+
+      <note conref="../shared/impala_common.xml#common/only_cdh5_22x" />
+
+    </conbody>
+
+  </concept>
+
+  <concept audience="Cloudera" rev="5.4.6" id="new_features_226">
+
+    <title>New Features in Impala 2.2.6 / CDH 5.4.6</title>
+
+    <conbody>
+
+      <p>
+        No new features. This point release is exclusively a bug fix release.
+      </p>
+
+      <note conref="../shared/impala_common.xml#common/only_cdh5_22x" />
+
+    </conbody>
+
+  </concept>
+
+  <concept rev="5.4.5" id="new_features_225">
+
+    <title>New Features in Impala 2.2.x for CDH 5.4.5</title>
+
+    <conbody>
+
+      <p>
+        No new features. This point release is exclusively a bug fix release.
+      </p>
+
+      <note conref="../shared/impala_common.xml#common/only_cdh5_22x" />
+
+    </conbody>
+
+  </concept>
+-->
+
+  <concept rev="5.4.3" id="new_features_223">
+
+    <title>New Features in Impala 2.2.x for CDH 5.4.3 and 5.4.4</title>
+
+    <conbody>
+
+      <p>
+        No new features added to the Impala code. The certification of Impala 
with EMC Isilon under CDH 5.4.4 means
+        that now you can query data stored on Isilon storage devices through 
Impala. See
+        <xref audience="integrated" href="cm_mc_isilon_service.xml"/><xref 
audience="standalone" 
href="http://www.cloudera.com/documentation/enterprise/latest/topics/cm_mc_isilon_service.html";
 scope="external" format="html"/>
+        for details. The same level of Impala is included with both CDH
+        5.4.3 and 5.4.4.
+<!-- This point release is exclusively a bug fix release. -->
+      </p>
+
+      <note conref="../shared/impala_common.xml#common/only_cdh5_22x" />
+
+    </conbody>
+
+  </concept>
+
+<!-- I let the 5.4.3/5.4.3 subtopic above remain in existence, but now back to 
hiding specific 5.4.x subtopics
+     after the .0 one that has the actual new features.
+  <concept audience="Cloudera" rev="5.4.2" id="new_features_222">
+
+    <title>New Features in Impala 2.2.x for CDH 5.4.2</title>
+
+    <conbody>
+
+      <p>
+        No new features. This point release is exclusively a bug fix release.
+      </p>
+
+      <note conref="../shared/impala_common.xml#common/only_cdh5_22x" />
+
+    </conbody>
+
+  </concept>
+
+  <concept rev="5.4.x" id="new_features_54x">
+
+    <title>New Features in Impala for CDH 5.4.x</title>
+
+    <conbody>
+
+      <p>
+        See <xref href="impala_new_features.xml#new_features_220"/> for the 
most recent set of new Impala features.
+        CDH maintenance releases such as 5.4.1, 5.4.2, and so on are 
exclusively bug fix releases,
+        therefore there are no new features for the 5.4.x series after 5.4.0.
+      </p>
+
+      <note conref="../shared/impala_common.xml#common/only_cdh5_22x" />
+
+    </conbody>
+
+  </concept>
+-->
+
+  <concept rev="2.2.0" id="new_features_220">
+
+    <title>New Features in Impala 2.2.x / CDH 5.4.x</title>
+
+    <conbody>
+
+      <note conref="../shared/impala_common.xml#common/only_cdh5_220" />
+
+      <p>
+        The following are the major new features in Impala 2.2.x. This 
release, available as part of CDH
+        5.4.x, contains improvements to performance, manageability, security, 
and SQL syntax.
+      </p>
+
+      <ul>
+        <li>
+          <p>
+            Several improvements to date and time features enable higher 
interoperability with Hive and other
+            database systems, provide more flexibility for handling time 
zones, and future-proof the handling of
+            <codeph>TIMESTAMP</codeph> values:
+          </p>
+          <ul>
+            <li>
+              <p>
+                The <codeph>WITH REPLICATION</codeph> clause for the 
<codeph>CREATE TABLE</codeph> and
+                <codeph>ALTER TABLE</codeph> statements lets you control the 
replication factor for
+                HDFS caching for a specific table or partition. By default, 
each cached block is
+                only present on a single host, which can lead to CPU 
contention if the same host
+                processes each cached block. Increasing the replication factor 
lets Impala choose
+                different hosts to process different cached blocks, to better 
distribute the CPU load.
+              </p>
+            </li>
+            <li>
+              <p>
+                Startup flags for the <cmdname>impalad</cmdname> daemon enable 
a higher level of compatibility with
+                <codeph>TIMESTAMP</codeph> values written by Hive, and more 
flexibility for working with date and
+                time data using the local time zone instead of UTC. To enable 
these features, set the
+                <cmdname>impalad</cmdname> startup flags
+                
<codeph>-use_local_tz_for_unix_timestamp_conversions=true</codeph> and
+                
<codeph>-convert_legacy_hive_parquet_utc_timestamps=true</codeph>.
+              </p>
+
+              <p>
+                The 
<codeph>-use_local_tz_for_unix_timestamp_conversions</codeph> setting controls 
how the
+                <codeph>unix_timestamp()</codeph>, 
<codeph>from_unixtime()</codeph>, and <codeph>now()</codeph>
+                functions handle time zones. By default (when this setting is 
turned off), Impala considers all
+                <codeph>TIMESTAMP</codeph> values to be in the UTC time zone 
when converting to or from Unix time
+                values. When this setting is enabled, Impala treats 
<codeph>TIMESTAMP</codeph> values passed to or
+                returned from these functions to be in the local time zone. 
When this setting is enabled, take
+                particular care that all hosts in the cluster have the same 
timezone settings, to avoid
+                inconsistent results depending on which host reads or writes 
<codeph>TIMESTAMP</codeph> data.
+              </p>
+
+              <p>
+                The 
<codeph>-convert_legacy_hive_parquet_utc_timestamps</codeph> setting causes 
Impala to convert
+                <codeph>TIMESTAMP</codeph> values to the local time zone when 
it reads them from Parquet files
+                written by Hive. This setting only applies to data using the 
Parquet file format, where Impala can
+                use metadata in the files to reliably determine that the files 
were written by Hive. If in the
+                future Hive changes the way it writes 
<codeph>TIMESTAMP</codeph> data in Parquet, Impala will
+                automatically handle that new <codeph>TIMESTAMP</codeph> 
encoding.
+              </p>
+
+              <p>
+                See <xref href="impala_timestamp.xml#timestamp"/> for details 
about time zone handling and the
+                configuration options for Impala / Hive compatibility with 
Parquet format.
+              </p>
+            </li>
+
+            <li>
+              <p conref="../shared/impala_common.xml#common/y2k38" />
+
+              <p>
+                See <xref 
href="impala_datetime_functions.xml#datetime_functions"/> for the current 
function
+                signatures.
+              </p>
+            </li>
+          </ul>
+        </li>
+
+        <li>
+          <p>
+            The <codeph>SHOW FILES</codeph> statement lets you view the names 
and sizes of the files that make up
+            an entire table or a specific partition. See <xref 
href="impala_show.xml#show_files"/> for details.
+          </p>
+        </li>
+
+        <li>
+          <p>
+            Impala can now run queries against Parquet data containing columns 
with complex or nested types, as
+            long as the query only refers to columns with scalar types.
+          </p>
+        </li>
+
+        <li>
+          <p>
+            Performance improvements for queries that include 
<codeph>IN()</codeph> operators and involve
+            partitioned tables.
+          </p>
+        </li>
+
+        <li>
+<!-- Same text for this item in impala_fixed_issues.xml. Could turn into a 
conref. -->
+          <p>
+            The new <codeph>-max_log_files</codeph> configuration option 
specifies how many log files to keep at
+            each severity level. The default value is 10, meaning that Impala 
preserves the latest 10 log files for
+            each severity level (<codeph>INFO</codeph>, 
<codeph>WARNING</codeph>, and <codeph>ERROR</codeph>) for
+            each Impala-related daemon (<cmdname>impalad</cmdname>, 
<cmdname>statestored</cmdname>, and
+            <cmdname>catalogd</cmdname>). Impala checks to see if any old logs 
need to be removed based on the
+            interval specified in the <codeph>logbufsecs</codeph> setting, 
every 5 seconds by default. See
+            <xref href="impala_logging.xml#logs_rotate"/> for details.
+          </p>
+        </li>
+
+        <li>
+          <p>
+            Redaction of sensitive data from Impala log files. This feature 
protects details such as credit card
+            numbers or tax IDs from administrators who see the text of SQL 
statements in the course of monitoring
+            and troubleshooting a Hadoop cluster. See <xref 
href="impala_logging.xml#redaction"/> for background
+            information for Impala users, and
+            <xref audience="integrated" 
href="sg_redaction.xml#log_redact"/><xref audience="standalone" 
href="http://www.cloudera.com/documentation/enterprise/latest/topics/sg_redaction.html";
 scope="external" format="html">the CDH 5 Security Guide</xref>
+            for usage details.
+          </p>
+        </li>
+
+        <li>
+          <p>
+            Lineage information is available for data created or queried by 
Impala. This feature lets you track who
+            has accessed data through Impala SQL statements, down to the level 
of specific columns, and how data
+            has been propagated between tables. See <xref 
href="impala_lineage.xml#lineage"/> for background
+            information for Impala users,
+            <xref audience="integrated" 
href="datamgmt_impala_lineage_log.xml"/><xref audience="standalone" 
href="http://www.cloudera.com/documentation/enterprise/latest/topics/datamgmt_impala_lineage_log.html";
 scope="external" format="html"/>
+            for usage details, and
+            <xref audience="integrated" href="cn_iu_lineage.xml" /><xref 
audience="standalone" 
href="http://www.cloudera.com/documentation/enterprise/latest/topics/cn_iu_lineage.html";
 scope="external" format="html"/>.
+            for how to interpret the lineage
+            information.
+          </p>
+        </li>
+
+        <li>
+          <p>
+            Impala tables and partitions can now be located on the Amazon 
Simple Storage Service (S3) filesystem,
+            for convenience in cases where data is already located in S3 and 
you prefer to query it in-place.
+            Queries might have lower performance than when the data files 
reside on HDFS, because Impala uses some
+            HDFS-specific optimizations. Impala can query data in S3, but 
cannot write to S3. Therefore, statements
+            such as <codeph>INSERT</codeph> and <codeph>LOAD DATA</codeph> are 
not available when the destination
+            table or partition is in S3. See <xref href="impala_s3.xml#s3"/> 
for details.
+          </p>
+
+          <note conref="../shared/impala_common.xml#common/s3_caveat" />
+        </li>
+
+        <li>
+        <!-- Only want the link out of the release notes to appear for HTML
+             (N.B. audience="PDF" means hide from PDF), and only in the HTML 
for the
+             integrated build where the topic is available for link 
resolution. -->
+          <p>
+            Improved support for HDFS encryption. The <codeph>LOAD 
DATA</codeph> statement now works when the
+            source directory and destination table are in different encryption 
zones. <ph audience="integrated"><ph audience="PDF">See
+            <xref href="cdh_sg_component_kms.xml#impala_encryption"/> for 
details about using HDFS encryption with
+            Impala.</ph></ph>
+          </p>
+        </li>
+
+        <li>
+          <p>
+            Additional arithmetic function <codeph>mod()</codeph>. See
+            <xref href="impala_math_functions.xml#math_functions"/> for 
details.
+          </p>
+        </li>
+
+        <li>
+          <p>
+            Flexibility to interpret <codeph>TIMESTAMP</codeph> values using 
the UTC time zone (the traditional
+            Impala behavior) or using the local time zone (for compatibility 
with <codeph>TIMESTAMP</codeph> values
+            produced by Hive).
+          </p>
+        </li>
+
+        <li>
+          <p>
+            Enhanced support for ETL using tools such as Flume. Impala ignores 
temporary files typically produced
+            by these tools (filenames with suffixes <codeph>.copying</codeph> 
and <codeph>.tmp</codeph>).
+          </p>
+        </li>
+
+        <li>
+          <p>
+            The CPU requirement for Impala, which had become more restrictive 
in Impala 2.0.x and 2.1.x, has now
+            been relaxed.
+          </p>
+
+          <p conref="../shared/impala_common.xml#common/cpu_prereq" />
+        </li>
+
+        <li>
+          <p>
+            Enhanced support for <codeph>CHAR</codeph> and 
<codeph>VARCHAR</codeph> types in the <codeph>COMPUTE
+            STATS</codeph> statement.
+          </p>
+        </li>
+
+        <li rev="CDH-26073">
+          <p>
+            The amount of memory required during setup for <q>spill to 
disk</q> operations is greatly reduced. This
+            enhancement reduces the chance of a memory-intensive join or 
aggregation query failing with an
+            out-of-memory error.
+          </p>
+        </li>
+
+        <li>
+          <p>
+            Several new conditional functions provide enhanced compatibility 
when porting code that uses industry
+            extensions. The new functions are: <codeph>isfalse()</codeph>, 
<codeph>isnotfalse()</codeph>,
+            <codeph>isnottrue()</codeph>, <codeph>istrue()</codeph>, 
<codeph>nonnullvalue()</codeph>, and
+            <codeph>nullvalue()</codeph>. See <xref 
href="impala_conditional_functions.xml#conditional_functions"/>
+            for details.
+          </p>
+        </li>
+
+        <li>
+          <p>
+            The Impala debug web UI now can display a visual representation of 
the query plan. On the
+            <uicontrol>/queries</uicontrol> tab, select 
<uicontrol>Details</uicontrol> for a particular query. The
+            <uicontrol>Details</uicontrol> page includes a 
<uicontrol>Plan</uicontrol> tab with a plan diagram that
+            you can zoom in or out (using scroll gestures through mouse wheel 
or trackpad).
+          </p>
+        </li>
+      </ul>
+
+<!-- End of new feature list for 5.4. -->
+
+    </conbody>
+
+  </concept>
+
+<!-- All 2.1.x subsections go under here -->
+
+  <concept rev="2.1.8" id="new_features_218">
+
+    <title>New Features in Impala 2.1.8 / CDH 5.3.10</title>
+
+    <conbody>
+
+      <p>
+        This point release is exclusively a bug fix release.
+      </p>
+
+      <note conref="../shared/impala_common.xml#common/only_cdh5_21x"/>
+
+    </conbody>
+
+  </concept>
+
+  <concept rev="2.1.7" id="new_features_217">
+
+    <title>New Features in Impala 2.1.7 / CDH 5.3.9</title>
+
+    <conbody>
+
+      <p>
+        This point release is exclusively a bug fix release.
+      </p>
+
+      <note conref="../shared/impala_common.xml#common/only_cdh5_21x"/>
+
+    </conbody>
+
+  </concept>
+
+  <concept rev="2.1.6" id="new_features_216">
+
+    <title>New Features in Impala 2.1.6 / CDH 5.3.8</title>
+
+    <conbody>
+
+      <p>
+        This point release is exclusively a bug fix release.
+      </p>
+
+      <note conref="../shared/impala_common.xml#common/only_cdh5_21x"/>
+
+    </conbody>
+
+  </concept>
+
+  <concept rev="2.1.5" id="new_features_215">
+
+    <title>New Features in Impala 2.1.5 / CDH 5.3.6</title>
+
+    <conbody>
+
+      <p>
+        This point release is exclusively a bug fix release.
+      </p>
+
+      <note conref="../shared/impala_common.xml#common/only_cdh5_21x"/>
+
+    </conbody>
+
+  </concept>
+
+  <concept rev="2.1.4" id="new_features_214">
+
+    <title>New Features in Impala 2.1.4 / CDH 5.3.4</title>
+
+    <conbody>
+
+      <p>
+        No new features. This point release is exclusively a bug fix release.
+        <ph conref="../shared/impala_common.xml#common/impala_214_redux"/>
+      </p>
+
+      <note conref="../shared/impala_common.xml#common/only_cdh5_21x"/>
+
+    </conbody>
+
+  </concept>
+
+  <concept rev="2.1.3" id="new_features_213">
+
+    <title>New Features in Impala 2.1.3 / CDH 5.3.3</title>
+
+    <conbody>
+
+      <p>
+        No new features. This point release is exclusively a bug fix release.
+      </p>
+
+      <note conref="../shared/impala_common.xml#common/only_cdh5_213" />
+
+    </conbody>
+
+  </concept>
+
+  <concept rev="2.1.2" id="new_features_212">
+
+    <title>New Features in Impala 2.1.2 / CDH 5.3.2</title>
+
+    <conbody>
+
+      <p>
+        No new features. This point release is exclusively a bug fix release.
+      </p>
+
+      <note conref="../shared/impala_common.xml#common/only_cdh5_212" />
+
+    </conbody>
+
+  </concept>
+
+  <concept rev="2.1.1" id="new_features_211">
+
+    <title>New Features in Impala 2.1.1 / CDH 5.3.1</title>
+
+    <conbody>
+
+      <p>
+        No new features. This point release is exclusively a bug fix release.
+      </p>
+
+    </conbody>
+
+  </concept>
+
+  <concept rev="2.1.0" id="new_features_210">
+
+    <title>New Features in Impala 2.1.0 / CDH 5.3.0</title>
+
+    <conbody>
+
+      <p>
+        This release contains the following enhancements to query performance 
and system scalability:
+      </p>
+
+      <ul>
+        <li>
+          <p>
+            Impala can now collect statistics for individual partitions in a 
partitioned table, rather than
+            processing the entire table for each <codeph>COMPUTE 
STATS</codeph> statement. This feature is known as
+            incremental statistics, and is controlled by the <codeph>COMPUTE 
INCREMENTAL STATS</codeph> syntax.
+            (You can still use the original <codeph>COMPUTE STATS</codeph> 
statement for nonpartitioned tables or
+            partitioned tables that are unchanging or whose contents are 
entirely replaced all at once.) See
+            <xref href="impala_compute_stats.xml#compute_stats"/> and
+            <xref href="impala_perf_stats.xml#perf_stats"/> for details.
+          </p>
+        </li>
+
+        <li>
+          <p>
+            Optimization for small queries lets Impala process queries that 
process very few rows without the
+            unnecessary overhead of parallelizing and generating native code. 
Reducing this overhead lets Impala
+            clear small queries quickly, keeping YARN resources and admission 
control slots available for
+            data-intensive queries. The number of rows considered to be a 
<q>small</q> query is controlled by the
+            <codeph>EXEC_SINGLE_NODE_ROWS_THRESHOLD</codeph> query option. See
+            <xref 
href="impala_exec_single_node_rows_threshold.xml#exec_single_node_rows_threshold"/>
 for details.
+          </p>
+        </li>
+
+        <li>
+          <p>
+            An enhancement to the statestore component lets it transmit 
heartbeat information independently of
+            broadcasting metadata updates. This optimization improves 
reliability of health checking on large
+            clusters with many tables and partitions.
+          </p>
+        </li>
+
+        <li>
+          <p>
+            The memory requirement for querying gzip-compressed text is 
reduced. Now Impala decompresses the data
+            as it is read, rather than reading the entire gzipped file and 
decompressing it in memory.
+          </p>
+        </li>
+      </ul>
+
+    </conbody>
+
+  </concept>
+
+<!-- All 2.0.x subsections go under here -->
+
+  <concept rev="2.0.5" id="new_features_205">
+
+    <title>New Features in Impala 2.0.5 / CDH 5.2.6</title>
+
+    <conbody>
+
+      <p>
+        No new features. This point release is exclusively a bug fix release.
+      </p>
+
+      <note conref=


<TRUNCATED>

[18/51] [partial] incubator-impala git commit: IMPALA-3398: Add docs to main Impala branch.

Reply via email to