[2/6] incubator-impala git commit: Add files that weren't needed during initial build testing of SQL Reference.

jrussell Sun, 30 Oct 2016 22:54:08 -0700

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/bb88fdc0/docs/topics/impala_known_issues.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_known_issues.xml 
b/docs/topics/impala_known_issues.xml
new file mode 100644
index 0000000..7b9ec2b
--- /dev/null
+++ b/docs/topics/impala_known_issues.xml
@@ -0,0 +1,1812 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept rev="ver" id="known_issues">
+
+  <title><ph audience="standalone">Known Issues and Workarounds in 
Impala</ph><ph audience="integrated">Apache Impala (incubating) Known 
Issues</ph></title>
+
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="Release Notes"/>
+      <data name="Category" value="Known Issues"/>
+      <data name="Category" value="Troubleshooting"/>
+      <data name="Category" value="Upgrading"/>
+      <data name="Category" value="Administrators"/>
+      <data name="Category" value="Developers"/>
+      <data name="Category" value="Data Analysts"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      The following sections describe known issues and workarounds in Impala, 
as of the current production release. This page summarizes the
+      most serious or frequently encountered issues in the current release, to 
help you make planning decisions about installing and
+      upgrading. Any workarounds are listed here. The bug links take you to 
the Impala issues site, where you can see the diagnosis and
+      whether a fix is in the pipeline.
+    </p>
+
+    <note>
+      The online issue tracking system for Impala contains comprehensive 
information and is updated in real time. To verify whether an issue
+      you are experiencing has already been reported, or which release an 
issue is fixed in, search on the
+      <xref href="https://issues.cloudera.org/"; scope="external" 
format="html">issues.cloudera.org JIRA tracker</xref>.
+    </note>
+
+    <p outputclass="toc inpage"/>
+
+    <p>
+      For issues fixed in various Impala releases, see <xref 
href="impala_fixed_issues.xml#fixed_issues"/>.
+    </p>
+
+<!-- Use as a template for new issues.
+    <concept id="">
+      <title></title>
+      <conbody>
+        <p>
+        </p>
+        <p><b>Bug:</b> <xref href="https://issues.cloudera.org/browse/"; 
scope="external" format="html"></xref></p>
+        <p><b>Severity:</b> High</p>
+        <p><b>Resolution:</b> </p>
+        <p><b>Workaround:</b> </p>
+      </conbody>
+    </concept>
+
+-->
+
+  </conbody>
+
+<!-- New known issues for CDH 5.5 / Impala 2.3.
+
+Title: Server-to-server SSL and Kerberos do not work together
+Description: If server<->server SSL is enabled (with 
ssl_client_ca_certificate), and Kerberos auth is used between servers, the 
cluster will fail to start.
+Upstream & Internal JIRAs: https://issues.cloudera.org/browse/IMPALA-2598
+Severity: Medium.  Server-to-server SSL is practically unusable but this is a 
new feature.
+Workaround: No known workaround.
+
+Title: Queries may hang on server-to-server exchange errors
+Description: The DataStreamSender::Channel::CloseInternal() does not close the 
channel on an error. This will cause the node on the other side of the channel 
to wait indefinitely causing a hang.
+Upstream & Internal JIRAs: https://issues.cloudera.org/browse/IMPALA-2592
+Severity: Low.  This does not occur frequently.
+Workaround: No known workaround.
+
+Title: Catalogd may crash when loading metadata for tables with many 
partitions, many columns and with incremental stats
+Description: Incremental stats use up about 400 bytes per partition X column.  
So for a table with 20K partitions and 100 columns this is about 800 MB.  When 
serialized this goes past the 2 GB Java array size limit and leads to a catalog 
crash.
+Upstream & Internal JIRAs: https://issues.cloudera.org/browse/IMPALA-2648, 
IMPALA-2647, IMPALA-2649.
+Severity: Low.  This does not occur frequently.
+Workaround:  Reduce the number of partitions.
+
+More from: 
https://issues.cloudera.org/browse/IMPALA-2093?filter=11278&jql=project%20%3D%20IMPALA%20AND%20priority%20in%20(blocker%2C%20critical)%20AND%20status%20in%20(open%2C%20Reopened)%20AND%20labels%20%3D%20correctness%20ORDER%20BY%20priority%20DESC
+
+IMPALA-2093
+Wrong plan of NOT IN aggregate subquery when a constant is used in subquery 
predicate
+IMPALA-1652
+Incorrect results with basic predicate on CHAR typed column.
+IMPALA-1459
+Incorrect assignment of predicates through an outer join in an inline view.
+IMPALA-2665
+Incorrect assignment of On-clause predicate inside inline view with an outer 
join.
+IMPALA-2603
+Crash: impala::Coordinator::ValidateCollectionSlots
+IMPALA-2375
+Fix issues with the legacy join and agg nodes using 
enable_partitioned_hash_join=false and enable_partitioned_aggregation=false
+IMPALA-1862
+Invalid bool value not reported as a scanner error
+IMPALA-1792
+ImpalaODBC: Can not get the value in the SQLGetData(m-x th column) after the 
SQLBindCol(m th column)
+IMPALA-1578
+Impala incorrectly handles text data when the new line character \n\r is split 
between different HDFS block
+IMPALA-2643
+Duplicated column in inline view causes dropping null slots during scan
+IMPALA-2005
+A failed CTAS does not drop the table if the insert fails.
+IMPALA-1821
+Casting scenarios with invalid/inconsistent results
+
+Another list from Alex, of correctness problems with predicates; might overlap 
with ones I already have:
+
+https://issues.cloudera.org/browse/IMPALA-2665 - Already have
+https://issues.cloudera.org/browse/IMPALA-2643 - Already have
+https://issues.cloudera.org/browse/IMPALA-1459 - Already have
+https://issues.cloudera.org/browse/IMPALA-2144 - Don't have
+
+-->
+
+  <concept id="known_issues_crash">
+
+    <title>Impala Known Issues: Crashes and Hangs</title>
+
+    <conbody>
+
+      <p>
+        These issues can cause Impala to quit or become unresponsive.
+      </p>
+
+    </conbody>
+
+    <concept id="IMPALA-3069" rev="IMPALA-3069">
+
+      <title>Setting BATCH_SIZE query option too large can cause a 
crash</title>
+
+      <conbody>
+
+        <p>
+          Using a value in the millions for the <codeph>BATCH_SIZE</codeph> 
query option, together with wide rows or large string values in
+          columns, could cause a memory allocation of more than 2 GB resulting 
in a crash.
+        </p>
+
+        <p>
+          <b>Bug:</b> <xref 
href="https://issues.cloudera.org/browse/IMPALA-3069"; scope="external" 
format="html">IMPALA-3069</xref>
+        </p>
+
+        <p>
+          <b>Severity:</b> High
+        </p>
+
+        <p><b>Resolution:</b> Fixed in CDH 5.9.0 / Impala 2.7.0.</p>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="IMPALA-3441" rev="IMPALA-3441">
+
+      <title></title>
+
+      <conbody>
+
+        <p>
+          Malformed Avro data, such as out-of-bounds integers or values in the 
wrong format, could cause a crash when queried.
+        </p>
+
+        <p>
+          <b>Bug:</b> <xref 
href="https://issues.cloudera.org/browse/IMPALA-3441"; scope="external" 
format="html">IMPALA-3441</xref>
+        </p>
+
+        <p>
+          <b>Severity:</b> High
+        </p>
+
+        <p><b>Resolution:</b> Fixed in CDH 5.9.0 / Impala 2.7.0 and CDH 5.8.2 
/ Impala 2.6.2.</p>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="IMPALA-2592" rev="IMPALA-2592">
+
+      <title>Queries may hang on server-to-server exchange errors</title>
+
+      <conbody>
+
+        <p>
+          The <codeph>DataStreamSender::Channel::CloseInternal()</codeph> does 
not close the channel on an error. This causes the node on
+          the other side of the channel to wait indefinitely, causing a hang.
+        </p>
+
+        <p>
+          <b>Bug:</b> <xref 
href="https://issues.cloudera.org/browse/IMPALA-2592"; scope="external" 
format="html">IMPALA-2592</xref>
+        </p>
+
+        <p>
+          <b>Resolution:</b> Fixed in CDH 5.7.0 / Impala 2.5.0.
+        </p>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="IMPALA-2365" rev="IMPALA-2365">
+
+      <title>Impalad is crashing if udf jar is not available in hdfs location 
for first time</title>
+
+      <conbody>
+
+        <p>
+          If the JAR file corresponding to a Java UDF is removed from HDFS 
after the Impala <codeph>CREATE FUNCTION</codeph> statement is
+          issued, the <cmdname>impalad</cmdname> daemon crashes.
+        </p>
+
+        <p>
+          <b>Bug:</b> <xref 
href="https://issues.cloudera.org/browse/IMPALA-2365"; scope="external" 
format="html">IMPALA-2365</xref>
+        </p>
+
+        <p><b>Resolution:</b> Fixed in CDH 5.7.0 / Impala 2.5.0.</p>
+
+      </conbody>
+
+    </concept>
+
+  </concept>
+
+  <concept id="known_issues_performance">
+
+    <title id="ki_performance">Impala Known Issues: Performance</title>
+
+    <conbody>
+
+      <p>
+        These issues involve the performance of operations such as queries or 
DDL statements.
+      </p>
+
+    </conbody>
+
+    <concept id="IMPALA-1480" rev="IMPALA-1480">
+
+<!-- Not part of Alex's spreadsheet. Spreadsheet has IMPALA-1423 which 
mentions it's similar to this one but not a duplicate. -->
+
+      <title>Slow DDL statements for tables with large number of 
partitions</title>
+
+      <conbody>
+
+        <p>
+          DDL statements for tables with a large number of partitions might be 
slow.
+        </p>
+
+        <p>
+          <b>Bug:</b> <xref 
href="https://issues.cloudera.org/browse/IMPALA-1480"; scope="external" 
format="html"></xref>IMPALA-1480
+        </p>
+
+        <p>
+          <b>Workaround:</b> Run the DDL statement in Hive if the slowness is 
an issue.
+        </p>
+
+        <p><b>Resolution:</b> Fixed in CDH 5.7.0 / Impala 2.5.0.</p>
+
+      </conbody>
+
+    </concept>
+
+  </concept>
+
+  <concept id="known_issues_usability">
+
+    <title id="ki_usability">Impala Known Issues: Usability</title>
+
+    <conbody>
+
+      <p>
+        These issues affect the convenience of interacting directly with 
Impala, typically through the Impala shell or Hue.
+      </p>
+
+    </conbody>
+
+    <concept id="IMPALA-3133" rev="IMPALA-3133">
+
+      <title>Unexpected privileges in show output</title>
+
+      <conbody>
+
+        <p>
+          Due to a timing condition in updating cached policy data from 
Sentry, the <codeph>SHOW</codeph> statements for Sentry roles could
+          sometimes display out-of-date role settings. Because Impala rechecks 
authorization for each SQL statement, this discrepancy does
+          not represent a security issue for other statements.
+        </p>
+
+        <p>
+          <b>Bug:</b> <xref 
href="https://issues.cloudera.org/browse/IMPALA-3133"; scope="external" 
format="html">IMPALA-3133</xref>
+        </p>
+
+        <p>
+          <b>Severity:</b> High
+        </p>
+
+        <p>
+          <b>Resolution:</b> Fixes have been issued for some but not all CDH / 
Impala releases. Check the JIRA for details of fix releases.
+        </p>
+
+        <p><b>Resolution:</b> Fixed in CDH 5.8.0 / Impala 2.6.0 and CDH 5.7.1 
/ Impala 2.5.1.</p>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="IMPALA-1776" rev="IMPALA-1776">
+
+      <title>Less than 100% progress on completed simple SELECT queries</title>
+
+      <conbody>
+
+        <p>
+          Simple <codeph>SELECT</codeph> queries show less than 100% progress 
even though they are already completed.
+        </p>
+
+        <p>
+          <b>Bug:</b> <xref 
href="https://issues.cloudera.org/browse/IMPALA-1776"; scope="external" 
format="html">IMPALA-1776</xref>
+        </p>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="concept_lmx_dk5_lx">
+
+      <title>Unexpected column overflow behavior with INT datatypes</title>
+
+      <conbody>
+
+        <p conref="../shared/impala_common.xml#common/int_overflow_behavior" />
+
+        <p>
+          <b>Bug:</b>
+          <xref href="https://issues.cloudera.org/browse/IMPALA-3123";
+            scope="external" format="html">IMPALA-3123</xref>
+        </p>
+
+      </conbody>
+
+    </concept>
+
+  </concept>
+
+  <concept id="known_issues_drivers">
+
+    <title id="ki_drivers">Impala Known Issues: JDBC and ODBC Drivers</title>
+
+    <conbody>
+
+      <p>
+        These issues affect applications that use the JDBC or ODBC APIs, such 
as business intelligence tools or custom-written applications
+        in languages such as Java or C++.
+      </p>
+
+    </conbody>
+
+    <concept id="IMPALA-1792" rev="IMPALA-1792">
+
+<!-- Not part of Alex's spreadsheet -->
+
+      <title>ImpalaODBC: Can not get the value in the SQLGetData(m-x th 
column) after the SQLBindCol(m th column)</title>
+
+      <conbody>
+
+        <p>
+          If the ODBC <codeph>SQLGetData</codeph> is called on a series of 
columns, the function calls must follow the same order as the
+          columns. For example, if data is fetched from column 2 then column 
1, the <codeph>SQLGetData</codeph> call for column 1 returns
+          <codeph>NULL</codeph>.
+        </p>
+
+        <p>
+          <b>Bug:</b> <xref 
href="https://issues.cloudera.org/browse/IMPALA-1792"; scope="external" 
format="html">IMPALA-1792</xref>
+        </p>
+
+        <p>
+          <b>Workaround:</b> Fetch columns in the same order they are defined 
in the table.
+        </p>
+
+      </conbody>
+
+    </concept>
+
+  </concept>
+
+  <concept id="known_issues_security">
+
+    <title id="ki_security">Impala Known Issues: Security</title>
+
+    <conbody>
+
+      <p>
+        These issues relate to security features, such as Kerberos 
authentication, Sentry authorization, encryption, auditing, and
+        redaction.
+      </p>
+
+    </conbody>
+
+<!-- To do: Hiding for the moment. https://jira.cloudera.com/browse/CDH-38736 
reports the issue is fixed. -->
+
+    <concept id="impala-shell_ssl_dependency" audience="Cloudera" 
rev="impala-shell_ssl_dependency">
+
+      <title>impala-shell requires Python with ssl module</title>
+
+      <conbody>
+
+        <p>
+          On CentOS 5.10 and Oracle Linux 5.11 using the built-in Python 2.4, 
invoking the <cmdname>impala-shell</cmdname> with the
+          <codeph>--ssl</codeph> option might fail with the following error:
+        </p>
+
+<codeblock>
+Unable to import the python 'ssl' module. It is required for an SSL-secured 
connection.
+</codeblock>
+
+<!-- No associated IMPALA-* JIRA... It is the internal JIRA CDH-38736. -->
+
+        <p>
+          <b>Severity:</b> Low, workaround available
+        </p>
+
+        <p>
+          <b>Resolution:</b> Customers are less likely to experience this 
issue over time, because <codeph>ssl</codeph> module is included
+          in newer Python releases packaged with recent Linux releases.
+        </p>
+
+        <p>
+          <b>Workaround:</b> To use SSL with <cmdname>impala-shell</cmdname> 
on these platform versions, install the <codeph>ssh</codeph>
+          Python module:
+        </p>
+
+<codeblock>
+yum install python-ssl
+</codeblock>
+
+        <p>
+          Then <cmdname>impala-shell</cmdname> can run when using SSL. For 
example:
+        </p>
+
+<codeblock>
+impala-shell -s impala --ssl --ca_cert /path_to_truststore/truststore.pem
+</codeblock>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="renewable_kerberos_tickets">
+
+<!-- Not part of Alex's spreadsheet. Not associated with a JIRA number AFAIK. 
-->
+
+      <title>Kerberos tickets must be renewable</title>
+
+      <conbody>
+
+        <p>
+          In a Kerberos environment, the <cmdname>impalad</cmdname> daemon 
might not start if Kerberos tickets are not renewable.
+        </p>
+
+        <p>
+          <b>Workaround:</b> Configure your KDC to allow tickets to be 
renewed, and configure <filepath>krb5.conf</filepath> to request
+          renewable tickets.
+        </p>
+
+      </conbody>
+
+    </concept>
+
+<!-- To do: Fixed in 2.5.0, 2.3.2. Commenting out until I see how it can fix 
into "known issues now fixed" convention.
+     That set of fix releases looks incomplete so probably have to do some 
detective work with the JIRA.
+     https://issues.cloudera.org/browse/IMPALA-2598
+    <concept id="IMPALA-2598" rev="IMPALA-2598">
+
+      <title>Server-to-server SSL and Kerberos do not work together</title>
+
+      <conbody>
+
+        <p>
+          If SSL is enabled between internal Impala components (with 
<codeph>ssl_client_ca_certificate</codeph>), and Kerberos
+          authentication is used between servers, the cluster fails to start.
+        </p>
+
+        <p>
+          <b>Bug:</b> <xref 
href="https://issues.cloudera.org/browse/IMPALA-2598"; scope="external" 
format="html">IMPALA-2598</xref>
+        </p>
+
+        <p>
+          <b>Workaround:</b> Do not use the new 
<codeph>ssl_client_ca_certificate</codeph> setting on Kerberos-enabled clusters 
until this
+          issue is resolved.
+        </p>
+
+        <p><b>Resolution:</b> Fixed in CDH 5.7.0 / Impala 2.5.0 and CDH 5.5.2 
/ Impala 2.3.2.</p>
+
+      </conbody>
+
+    </concept>
+-->
+
+  </concept>
+
+<!--
+  <concept id="known_issues_supportability">
+
+    <title id="ki_supportability">Impala Known Issues: Supportability</title>
+
+    <conbody>
+
+      <p>
+        These issues affect the ability to debug and troubleshoot Impala, such 
as incorrect output in query profiles or the query state
+        shown in monitoring applications.
+      </p>
+
+    </conbody>
+
+  </concept>
+-->
+
+  <concept id="known_issues_resources">
+
+    <title id="ki_resources">Impala Known Issues: Resources</title>
+
+    <conbody>
+
+      <p>
+        These issues involve memory or disk usage, including out-of-memory 
conditions, the spill-to-disk feature, and resource management
+        features.
+      </p>
+
+    </conbody>
+
+    <concept id="TSB-168">
+
+      <title>Impala catalogd heap issues when upgrading to 5.7</title>
+
+      <conbody>
+
+        <p>
+          The default heap size for Impala <cmdname>catalogd</cmdname> has 
changed in CDH 5.7 / Impala 2.5 and higher:
+        </p>
+
+        <ul>
+          <li>
+            <p>
+              Before 5.7, by default <cmdname>catalogd</cmdname> was using the 
JVM's default heap size, which is the smaller of 1/4th of the
+              physical memory or 32 GB.
+            </p>
+          </li>
+
+          <li>
+            <p>
+              Starting with CDH 5.7.0, the default <cmdname>catalogd</cmdname> 
heap size is 4 GB.
+            </p>
+          </li>
+        </ul>
+
+        <p>
+          For example, on a host with 128GB physical memory this will result 
in catalogd heap decreasing from 32GB to 4GB. This can result
+          in out-of-memory errors in catalogd and leading to query failures.
+        </p>
+
+        <p audience="Cloudera">
+          <b>Bug:</b> <xref href="https://jira.cloudera.com/browse/TSB-168"; 
scope="external" format="html">TSB-168</xref>
+        </p>
+
+        <p>
+          <b>Severity:</b> High
+        </p>
+
+        <p>
+          <b>Workaround:</b> Increase the <cmdname>catalogd</cmdname> memory 
limit as follows.
+<!-- See <xref href="impala_scalability.xml#scalability_catalog"/> for the 
procedure. -->
+<!-- Including full details here via conref, for benefit of PDF readers or 
anyone else
+             who might have trouble seeing or following the link. -->
+        </p>
+
+        <p 
conref="../shared/impala_common.xml#common/increase_catalogd_heap_size"/>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="IMPALA-3509" rev="IMPALA-3509">
+
+      <title>Breakpad minidumps can be very large when the thread count is 
high</title>
+
+      <conbody>
+
+        <p>
+          The size of the breakpad minidump files grows linearly with the 
number of threads. By default, each thread adds 8 KB to the
+          minidump size. Minidump files could consume significant disk space 
when the daemons have a high number of threads.
+        </p>
+
+        <p>
+          <b>Bug:</b> <xref 
href="https://issues.cloudera.org/browse/IMPALA-3509"; scope="external" 
format="html">IMPALA-3509</xref>
+        </p>
+
+        <p>
+          <b>Severity:</b> High
+        </p>
+
+        <p>
+          <b>Workaround:</b> Add 
<codeph>--minidump_size_limit_hint_kb=<varname>size</varname></codeph> to set a 
soft upper limit on the
+          size of each minidump file. If the minidump file would exceed that 
limit, Impala reduces the amount of information for each thread
+          from 8 KB to 2 KB. (Full thread information is captured for the 
first 20 threads, then 2 KB per thread after that.) The minidump
+          file can still grow larger than the <q>hinted</q> size. For example, 
if you have 10,000 threads, the minidump file can be more
+          than 20 MB.
+        </p>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="IMPALA-3662" rev="IMPALA-3662">
+
+      <title>Parquet scanner memory increase after IMPALA-2736</title>
+
+      <conbody>
+
+        <p>
+          The initial release of CDH 5.8 / Impala 2.6 sometimes has a higher 
peak memory usage than in previous releases while reading
+          Parquet files.
+        </p>
+
+        <p>
+          CDH 5.8 / Impala 2.6 addresses the issue IMPALA-2736, which improves 
the efficiency of Parquet scans by up to 2x. The faster scans
+          may result in a higher peak memory consumption compared to earlier 
versions of Impala due to the new column-wise row
+          materialization strategy. You are likely to experience higher memory 
consumption in any of the following scenarios:
+          <ul>
+            <li>
+              <p>
+                Very wide rows due to projecting many columns in a scan.
+              </p>
+            </li>
+
+            <li>
+              <p>
+                Very large rows due to big column values, for example, long 
strings or nested collections with many items.
+              </p>
+            </li>
+
+            <li>
+              <p>
+                Producer/consumer speed imbalances, leading to more rows being 
buffered between a scan (producer) and downstream (consumer)
+                plan nodes.
+              </p>
+            </li>
+          </ul>
+        </p>
+
+        <p>
+          <b>Bug:</b> <xref 
href="https://issues.cloudera.org/browse/IMPALA-3662"; scope="external" 
format="html">IMPALA-3662</xref>
+        </p>
+
+        <p>
+          <b>Severity:</b> High
+        </p>
+
+        <p>
+          <b>Workaround:</b> The following query options might help to reduce 
memory consumption in the Parquet scanner:
+          <ul>
+            <li>
+              Reduce the number of scanner threads, for example: <codeph>set 
num_scanner_threads=30</codeph>
+            </li>
+
+            <li>
+              Reduce the batch size, for example: <codeph>set 
batch_size=512</codeph>
+            </li>
+
+            <li>
+              Increase the memory limit, for example: <codeph>set 
mem_limit=64g</codeph>
+            </li>
+          </ul>
+        </p>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="IMPALA-691" rev="IMPALA-691">
+
+      <title>Process mem limit does not account for the JVM's memory 
usage</title>
+
+<!-- Supposed to be resolved for Impala 2.3.0. -->
+
+      <conbody>
+
+        <p>
+          Some memory allocated by the JVM used internally by Impala is not 
counted against the memory limit for the
+          <cmdname>impalad</cmdname> daemon.
+        </p>
+
+        <p>
+          <b>Bug:</b> <xref 
href="https://issues.cloudera.org/browse/IMPALA-691"; scope="external" 
format="html">IMPALA-691</xref>
+        </p>
+
+        <p>
+          <b>Workaround:</b> To monitor overall memory usage, use the 
<cmdname>top</cmdname> command, or add the memory figures in the
+          Impala web UI <uicontrol>/memz</uicontrol> tab to JVM memory usage 
shown on the <uicontrol>/metrics</uicontrol> tab.
+        </p>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="IMPALA-2375" rev="IMPALA-2375">
+
+<!-- Not part of Alex's spreadsheet -->
+
+      <title>Fix issues with the legacy join and agg nodes using 
--enable_partitioned_hash_join=false and 
--enable_partitioned_aggregation=false</title>
+
+      <conbody>
+
+        <p></p>
+
+        <p>
+          <b>Bug:</b> <xref 
href="https://issues.cloudera.org/browse/IMPALA-2375"; scope="external" 
format="html">IMPALA-2375</xref>
+        </p>
+
+        <p>
+          <b>Workaround:</b> Transition away from the <q>old-style</q> join 
and aggregation mechanism if practical.
+        </p>
+
+        <p><b>Resolution:</b> Fixed in CDH 5.7.0 / Impala 2.5.0.</p>
+
+      </conbody>
+
+    </concept>
+
+  </concept>
+
+  <concept id="known_issues_correctness">
+
+    <title id="ki_correctness">Impala Known Issues: Correctness</title>
+
+    <conbody>
+
+      <p>
+        These issues can cause incorrect or unexpected results from queries. 
They typically only arise in very specific circumstances.
+      </p>
+
+    </conbody>
+
+    <concept id="IMPALA-3084" rev="IMPALA-3084">
+
+      <title>Incorrect assignment of NULL checking predicate through an outer 
join of a nested collection.</title>
+
+      <conbody>
+
+        <p>
+          A query could return wrong results (too many or too few 
<codeph>NULL</codeph> values) if it referenced an outer-joined nested
+          collection and also contained a null-checking predicate (<codeph>IS 
NULL</codeph>, <codeph>IS NOT NULL</codeph>, or the
+          <codeph>&lt;=&gt;</codeph> operator) in the <codeph>WHERE</codeph> 
clause.
+        </p>
+
+        <p>
+          <b>Bug:</b> <xref 
href="https://issues.cloudera.org/browse/IMPALA-3084"; scope="external" 
format="html">IMPALA-3084</xref>
+        </p>
+
+        <p>
+          <b>Severity:</b> High
+        </p>
+
+        <p><b>Resolution:</b> Fixed in CDH 5.9.0 / Impala 2.7.0.</p>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="IMPALA-3094" rev="IMPALA-3094">
+
+      <title>Incorrect result due to constant evaluation in query with outer 
join</title>
+
+      <conbody>
+
+        <p>
+          An <codeph>OUTER JOIN</codeph> query could omit some expected result 
rows due to a constant such as <codeph>FALSE</codeph> in
+          another join clause. For example:
+        </p>
+
+<codeblock><![CDATA[
+explain SELECT 1 FROM alltypestiny a1
+  INNER JOIN alltypesagg a2 ON a1.smallint_col = a2.year AND false
+  RIGHT JOIN alltypes a3 ON a1.year = a1.bigint_col;
++---------------------------------------------------------+
+| Explain String                                          |
++---------------------------------------------------------+
+| Estimated Per-Host Requirements: Memory=1.00KB VCores=1 |
+|                                                         |
+| 00:EMPTYSET                                             |
++---------------------------------------------------------+
+]]>
+</codeblock>
+
+        <p>
+          <b>Bug:</b> <xref 
href="https://issues.cloudera.org/browse/IMPALA-3094"; scope="external" 
format="html">IMPALA-3094</xref>
+        </p>
+
+        <p>
+          <b>Severity:</b> High
+        </p>
+
+        <p>
+          <b>Resolution:</b>
+        </p>
+
+        <p>
+          <b>Workaround:</b>
+        </p>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="IMPALA-3126" rev="IMPALA-3126">
+
+      <title>Incorrect assignment of an inner join On-clause predicate through 
an outer join.</title>
+
+      <conbody>
+
+        <p>
+          Impala may return incorrect results for queries that have the 
following properties:
+        </p>
+
+        <ul>
+          <li>
+            <p>
+              There is an INNER JOIN following a series of OUTER JOINs.
+            </p>
+          </li>
+
+          <li>
+            <p>
+              The INNER JOIN has an On-clause with a predicate that references 
at least two tables that are on the nullable side of the
+              preceding OUTER JOINs.
+            </p>
+          </li>
+        </ul>
+
+        <p>
+          The following query demonstrates the issue:
+        </p>
+
+<codeblock>
+select 1 from functional.alltypes a left outer join
+  functional.alltypes b on a.id = b.id left outer join
+  functional.alltypes c on b.id = c.id right outer join
+  functional.alltypes d on c.id = d.id inner join functional.alltypes e
+on b.int_col = c.int_col;
+</codeblock>
+
+        <p>
+          The following listing shows the incorrect <codeph>EXPLAIN</codeph> 
plan:
+        </p>
+
+<codeblock><![CDATA[
++-----------------------------------------------------------+
+| Explain String                                            |
++-----------------------------------------------------------+
+| Estimated Per-Host Requirements: Memory=480.04MB VCores=4 |
+|                                                           |
+| 14:EXCHANGE [UNPARTITIONED]                               |
+| |                                                         |
+| 08:NESTED LOOP JOIN [CROSS JOIN, BROADCAST]               |
+| |                                                         |
+| |--13:EXCHANGE [BROADCAST]                                |
+| |  |                                                      |
+| |  04:SCAN HDFS [functional.alltypes e]                   |
+| |     partitions=24/24 files=24 size=478.45KB             |
+| |                                                         |
+| 07:HASH JOIN [RIGHT OUTER JOIN, PARTITIONED]              |
+| |  hash predicates: c.id = d.id                           |
+| |  runtime filters: RF000 <- d.id                         |
+| |                                                         |
+| |--12:EXCHANGE [HASH(d.id)]                               |
+| |  |                                                      |
+| |  03:SCAN HDFS [functional.alltypes d]                   |
+| |     partitions=24/24 files=24 size=478.45KB             |
+| |                                                         |
+| 06:HASH JOIN [LEFT OUTER JOIN, PARTITIONED]               |
+| |  hash predicates: b.id = c.id                           |
+| |  other predicates: b.int_col = c.int_col     <--- incorrect placement; 
should be at node 07 or 08
+| |  runtime filters: RF001 <- c.int_col                    |
+| |                                                         |
+| |--11:EXCHANGE [HASH(c.id)]                               |
+| |  |                                                      |
+| |  02:SCAN HDFS [functional.alltypes c]                   |
+| |     partitions=24/24 files=24 size=478.45KB             |
+| |     runtime filters: RF000 -> c.id                      |
+| |                                                         |
+| 05:HASH JOIN [RIGHT OUTER JOIN, PARTITIONED]              |
+| |  hash predicates: b.id = a.id                           |
+| |  runtime filters: RF002 <- a.id                         |
+| |                                                         |
+| |--10:EXCHANGE [HASH(a.id)]                               |
+| |  |                                                      |
+| |  00:SCAN HDFS [functional.alltypes a]                   |
+| |     partitions=24/24 files=24 size=478.45KB             |
+| |                                                         |
+| 09:EXCHANGE [HASH(b.id)]                                  |
+| |                                                         |
+| 01:SCAN HDFS [functional.alltypes b]                      |
+|    partitions=24/24 files=24 size=478.45KB                |
+|    runtime filters: RF001 -> b.int_col, RF002 -> b.id     |
++-----------------------------------------------------------+
+]]>
+</codeblock>
+
+        <p>
+          <b>Bug:</b> <xref 
href="https://issues.cloudera.org/browse/IMPALA-3126"; scope="external" 
format="html">IMPALA-3126</xref>
+        </p>
+
+        <p>
+          <b>Severity:</b> High
+        </p>
+
+        <p>
+          <b>Workaround:</b> High
+        </p>
+
+        <p>
+          For some queries, this problem can be worked around by placing the 
problematic <codeph>ON</codeph> clause predicate in the
+          <codeph>WHERE</codeph> clause instead, or changing the preceding 
<codeph>OUTER JOIN</codeph>s to <codeph>INNER JOIN</codeph>s (if
+          the <codeph>ON</codeph> clause predicate would discard 
<codeph>NULL</codeph>s). For example, to fix the problematic query above:
+        </p>
+
+<codeblock><![CDATA[
+select 1 from functional.alltypes a
+  left outer join functional.alltypes b
+    on a.id = b.id
+  left outer join functional.alltypes c
+    on b.id = c.id
+  right outer join functional.alltypes d
+    on c.id = d.id
+  inner join functional.alltypes e
+where b.int_col = c.int_col
+
++-----------------------------------------------------------+
+| Explain String                                            |
++-----------------------------------------------------------+
+| Estimated Per-Host Requirements: Memory=480.04MB VCores=4 |
+|                                                           |
+| 14:EXCHANGE [UNPARTITIONED]                               |
+| |                                                         |
+| 08:NESTED LOOP JOIN [CROSS JOIN, BROADCAST]               |
+| |                                                         |
+| |--13:EXCHANGE [BROADCAST]                                |
+| |  |                                                      |
+| |  04:SCAN HDFS [functional.alltypes e]                   |
+| |     partitions=24/24 files=24 size=478.45KB             |
+| |                                                         |
+| 07:HASH JOIN [RIGHT OUTER JOIN, PARTITIONED]              |
+| |  hash predicates: c.id = d.id                           |
+| |  other predicates: b.int_col = c.int_col          <-- correct assignment
+| |  runtime filters: RF000 <- d.id                         |
+| |                                                         |
+| |--12:EXCHANGE [HASH(d.id)]                               |
+| |  |                                                      |
+| |  03:SCAN HDFS [functional.alltypes d]                   |
+| |     partitions=24/24 files=24 size=478.45KB             |
+| |                                                         |
+| 06:HASH JOIN [LEFT OUTER JOIN, PARTITIONED]               |
+| |  hash predicates: b.id = c.id                           |
+| |                                                         |
+| |--11:EXCHANGE [HASH(c.id)]                               |
+| |  |                                                      |
+| |  02:SCAN HDFS [functional.alltypes c]                   |
+| |     partitions=24/24 files=24 size=478.45KB             |
+| |     runtime filters: RF000 -> c.id                      |
+| |                                                         |
+| 05:HASH JOIN [RIGHT OUTER JOIN, PARTITIONED]              |
+| |  hash predicates: b.id = a.id                           |
+| |  runtime filters: RF001 <- a.id                         |
+| |                                                         |
+| |--10:EXCHANGE [HASH(a.id)]                               |
+| |  |                                                      |
+| |  00:SCAN HDFS [functional.alltypes a]                   |
+| |     partitions=24/24 files=24 size=478.45KB             |
+| |                                                         |
+| 09:EXCHANGE [HASH(b.id)]                                  |
+| |                                                         |
+| 01:SCAN HDFS [functional.alltypes b]                      |
+|    partitions=24/24 files=24 size=478.45KB                |
+|    runtime filters: RF001 -> b.id                         |
++-----------------------------------------------------------+
+]]>
+</codeblock>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="IMPALA-3006" rev="IMPALA-3006">
+
+      <title>Impala may use incorrect bit order with BIT_PACKED 
encoding</title>
+
+      <conbody>
+
+        <p>
+          Parquet <codeph>BIT_PACKED</codeph> encoding as implemented by 
Impala is LSB first. The parquet standard says it is MSB first.
+        </p>
+
+        <p>
+          <b>Bug:</b> <xref 
href="https://issues.cloudera.org/browse/IMPALA-3006"; scope="external" 
format="html">IMPALA-3006</xref>
+        </p>
+
+        <p>
+          <b>Severity:</b> High, but rare in practice because BIT_PACKED is 
infrequently used, is not written by Impala, and is deprecated
+          in Parquet 2.0.
+        </p>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="IMPALA-3082" rev="IMPALA-3082">
+
+      <title>BST between 1972 and 1995</title>
+
+      <conbody>
+
+        <p>
+          The calculation of start and end times for the BST (British Summer 
Time) time zone could be incorrect between 1972 and 1995.
+          Between 1972 and 1995, BST began and ended at 02:00 GMT on the third 
Sunday in March (or second Sunday when Easter fell on the
+          third) and fourth Sunday in October. For example, both function 
calls should return 13, but actually return 12, in a query such
+          as:
+        </p>
+
+<codeblock>
+select
+  extract(from_utc_timestamp(cast('1970-01-01 12:00:00' as timestamp), 
'Europe/London'), "hour") summer70start,
+  extract(from_utc_timestamp(cast('1970-12-31 12:00:00' as timestamp), 
'Europe/London'), "hour") summer70end;
+</codeblock>
+
+        <p>
+          <b>Bug:</b> <xref 
href="https://issues.cloudera.org/browse/IMPALA-3082"; scope="external" 
format="html">IMPALA-3082</xref>
+        </p>
+
+        <p>
+          <b>Severity:</b> High
+        </p>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="IMPALA-1170" rev="IMPALA-1170">
+
+      <title>parse_url() returns incorrect result if @ character in URL</title>
+
+      <conbody>
+
+        <p>
+          If a URL contains an <codeph>@</codeph> character, the 
<codeph>parse_url()</codeph> function could return an incorrect value for
+          the hostname field.
+        </p>
+
+        <p>
+          <b>Bug:</b> <xref 
href="https://issues.cloudera.org/browse/IMPALA-1170"; scope="external" 
format="html"></xref>IMPALA-1170
+        </p>
+
+        <p><b>Resolution:</b> Fixed in CDH 5.7.0 / Impala 2.5.0 and CDH 5.5.4 
/ Impala 2.3.4.</p>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="IMPALA-2422" rev="IMPALA-2422">
+
+      <title>% escaping does not work correctly when occurs at the end in a 
LIKE clause</title>
+
+      <conbody>
+
+        <p>
+          If the final character in the RHS argument of a 
<codeph>LIKE</codeph> operator is an escaped <codeph>\%</codeph> character, it
+          does not match a <codeph>%</codeph> final character of the LHS 
argument.
+        </p>
+
+        <p>
+          <b>Bug:</b> <xref 
href="https://issues.cloudera.org/browse/IMPALA-2422"; scope="external" 
format="html">IMPALA-2422</xref>
+        </p>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="IMPALA-397" rev="IMPALA-397">
+
+      <title>ORDER BY rand() does not work.</title>
+
+      <conbody>
+
+        <p>
+          Because the value for <codeph>rand()</codeph> is computed early in a 
query, using an <codeph>ORDER BY</codeph> expression
+          involving a call to <codeph>rand()</codeph> does not actually 
randomize the results.
+        </p>
+
+        <p>
+          <b>Bug:</b> <xref 
href="https://issues.cloudera.org/browse/IMPALA-397"; scope="external" 
format="html">IMPALA-397</xref>
+        </p>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="IMPALA-2643" rev="IMPALA-2643">
+
+      <title>Duplicated column in inline view causes dropping null slots 
during scan</title>
+
+      <conbody>
+
+        <p>
+          If the same column is queried twice within a view, 
<codeph>NULL</codeph> values for that column are omitted. For example, the
+          result of <codeph>COUNT(*)</codeph> on the view could be less than 
expected.
+        </p>
+
+        <p>
+          <b>Bug:</b> <xref 
href="https://issues.cloudera.org/browse/IMPALA-2643"; scope="external" 
format="html">IMPALA-2643</xref>
+        </p>
+
+        <p>
+          <b>Workaround:</b> Avoid selecting the same column twice within an 
inline view.
+        </p>
+
+        <p><b>Resolution:</b> Fixed in CDH 5.7.0 / Impala 2.5.0, CDH 5.5.2 / 
Impala 2.3.2, and CDH 5.4.10 / Impala 2.2.10.</p>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="IMPALA-1459" rev="IMPALA-1459">
+
+<!-- Not part of Alex's spreadsheet -->
+
+      <title>Incorrect assignment of predicates through an outer join in an 
inline view.</title>
+
+      <conbody>
+
+        <p>
+          A query involving an <codeph>OUTER JOIN</codeph> clause where one of 
the table references is an inline view might apply predicates
+          from the <codeph>ON</codeph> clause incorrectly.
+        </p>
+
+        <p>
+          <b>Bug:</b> <xref 
href="https://issues.cloudera.org/browse/IMPALA-1459"; scope="external" 
format="html">IMPALA-1459</xref>
+        </p>
+
+        <p><b>Resolution:</b> Fixed in CDH 5.7.0 / Impala 2.5.0, CDH 5.5.2 / 
Impala 2.3.2, and CDH 5.4.9 / Impala 2.2.9.</p>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="IMPALA-2603" rev="IMPALA-2603">
+
+      <title>Crash: impala::Coordinator::ValidateCollectionSlots</title>
+
+      <conbody>
+
+        <p>
+          A query could encounter a serious error if includes multiple nested 
levels of <codeph>INNER JOIN</codeph> clauses involving
+          subqueries.
+        </p>
+
+        <p>
+          <b>Bug:</b> <xref 
href="https://issues.cloudera.org/browse/IMPALA-2603"; scope="external" 
format="html">IMPALA-2603</xref>
+        </p>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="IMPALA-2665" rev="IMPALA-2665">
+
+      <title>Incorrect assignment of On-clause predicate inside inline view 
with an outer join.</title>
+
+      <conbody>
+
+        <p>
+          A query might return incorrect results due to wrong predicate 
assignment in the following scenario:
+        </p>
+
+        <ol>
+          <li>
+            There is an inline view that contains an outer join
+          </li>
+
+          <li>
+            That inline view is joined with another table in the enclosing 
query block
+          </li>
+
+          <li>
+            That join has an On-clause containing a predicate that only 
references columns originating from the outer-joined tables inside
+            the inline view
+          </li>
+        </ol>
+
+        <p>
+          <b>Bug:</b> <xref 
href="https://issues.cloudera.org/browse/IMPALA-2665"; scope="external" 
format="html">IMPALA-2665</xref>
+        </p>
+
+        <p><b>Resolution:</b> Fixed in CDH 5.7.0 / Impala 2.5.0, CDH 5.5.2 / 
Impala 2.3.2, and CDH 5.4.9 / Impala 2.2.9.</p>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="IMPALA-2144" rev="IMPALA-2144">
+
+      <title>Wrong assignment of having clause predicate across outer 
join</title>
+
+      <conbody>
+
+        <p>
+          In an <codeph>OUTER JOIN</codeph> query with a 
<codeph>HAVING</codeph> clause, the comparison from the <codeph>HAVING</codeph>
+          clause might be applied at the wrong stage of query processing, 
leading to incorrect results.
+        </p>
+
+        <p>
+          <b>Bug:</b> <xref 
href="https://issues.cloudera.org/browse/IMPALA-2144"; scope="external" 
format="html">IMPALA-2144</xref>
+        </p>
+
+        <p><b>Resolution:</b> Fixed in CDH 5.7.0 / Impala 2.5.0.</p>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="IMPALA-2093" rev="IMPALA-2093">
+
+      <title>Wrong plan of NOT IN aggregate subquery when a constant is used 
in subquery predicate</title>
+
+      <conbody>
+
+        <p>
+          A <codeph>NOT IN</codeph> operator with a subquery that calls an 
aggregate function, such as <codeph>NOT IN (SELECT
+          SUM(...))</codeph>, could return incorrect results.
+        </p>
+
+        <p>
+          <b>Bug:</b> <xref 
href="https://issues.cloudera.org/browse/IMPALA-2093"; scope="external" 
format="html">IMPALA-2093</xref>
+        </p>
+
+        <p><b>Resolution:</b> Fixed in CDH 5.7.0 / Impala 2.5.0 and CDH 5.5.4 
/ Impala 2.3.4.</p>
+
+      </conbody>
+
+    </concept>
+
+  </concept>
+
+  <concept id="known_issues_metadata">
+
+    <title id="ki_metadata">Impala Known Issues: Metadata</title>
+
+    <conbody>
+
+      <p>
+        These issues affect how Impala interacts with metadata. They cover 
areas such as the metastore database, the <codeph>COMPUTE
+        STATS</codeph> statement, and the Impala <cmdname>catalogd</cmdname> 
daemon.
+      </p>
+
+    </conbody>
+
+    <concept id="IMPALA-2648" rev="IMPALA-2648">
+
+      <title>Catalogd may crash when loading metadata for tables with many 
partitions, many columns and with incremental stats</title>
+
+      <conbody>
+
+        <p>
+          Incremental stats use up about 400 bytes per partition for each 
column. For example, for a table with 20K partitions and 100
+          columns, the memory overhead from incremental statistics is about 
800 MB. When serialized for transmission across the network,
+          this metadata exceeds the 2 GB Java array size limit and leads to a 
<codeph>catalogd</codeph> crash.
+        </p>
+
+        <p>
+          <b>Bugs:</b> <xref 
href="https://issues.cloudera.org/browse/IMPALA-2647"; scope="external" 
format="html">IMPALA-2647</xref>,
+          <xref href="https://issues.cloudera.org/browse/IMPALA-2648"; 
scope="external" format="html">IMPALA-2648</xref>,
+          <xref href="https://issues.cloudera.org/browse/IMPALA-2649"; 
scope="external" format="html">IMPALA-2649</xref>
+        </p>
+
+        <p>
+          <b>Workaround:</b> If feasible, compute full stats periodically and 
avoid computing incremental stats for that table. The
+          scalability of incremental stats computation is a continuing work 
item.
+        </p>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="IMPALA-1420" rev="IMPALA-1420 2.0.0">
+
+<!-- Not part of Alex's spreadsheet -->
+
+      <title>Can't update stats manually via alter table after upgrading to 
CDH 5.2</title>
+
+      <conbody>
+
+        <p></p>
+
+        <p>
+          <b>Bug:</b> <xref 
href="https://issues.cloudera.org/browse/IMPALA-1420"; scope="external" 
format="html">IMPALA-1420</xref>
+        </p>
+
+        <p>
+          <b>Workaround:</b> On CDH 5.2, when adjusting table statistics 
manually by setting the <codeph>numRows</codeph>, you must also
+          enable the Boolean property 
<codeph>STATS_GENERATED_VIA_STATS_TASK</codeph>. For example, use a statement 
like the following to
+          set both properties with a single <codeph>ALTER TABLE</codeph> 
statement:
+        </p>
+
+<codeblock>ALTER TABLE <varname>table_name</varname> SET 
TBLPROPERTIES('numRows'='<varname>new_value</varname>', 
'STATS_GENERATED_VIA_STATS_TASK' = 'true');</codeblock>
+
+        <p>
+          <b>Resolution:</b> The underlying cause is the issue
+          <xref href="https://issues.apache.org/jira/browse/HIVE-8648"; 
scope="external" format="html">HIVE-8648</xref> that affects the
+          metastore in Hive 0.13. The workaround is only needed until the fix 
for this issue is incorporated into a CDH release.
+        </p>
+
+      </conbody>
+
+    </concept>
+
+  </concept>
+
+  <concept id="known_issues_interop">
+
+    <title id="ki_interop">Impala Known Issues: Interoperability</title>
+
+    <conbody>
+
+      <p>
+        These issues affect the ability to interchange data between Impala and 
other database systems. They cover areas such as data types
+        and file formats.
+      </p>
+
+    </conbody>
+
+<!-- Opened based on CDH-41605. Not part of Alex's spreadsheet AFAIK. -->
+
+    <concept id="CDH-41605">
+
+      <title>DESCRIBE FORMATTED gives error on Avro table</title>
+
+      <conbody>
+
+        <p>
+          This issue can occur either on old Avro tables (created prior to 
Hive 1.1 / CDH 5.4) or when changing the Avro schema file by
+          adding or removing columns. Columns added to the schema file will 
not show up in the output of the <codeph>DESCRIBE
+          FORMATTED</codeph> command. Removing columns from the schema file 
will trigger a <codeph>NullPointerException</codeph>.
+        </p>
+
+        <p>
+          As a workaround, you can use the output of <codeph>SHOW CREATE 
TABLE</codeph> to drop and recreate the table. This will populate
+          the Hive metastore database with the correct column definitions.
+        </p>
+
+        <note type="warning">
+          Only use this for external tables, or Impala will remove the data 
files. In case of an internal table, set it to external first:
+<codeblock>
+ALTER TABLE table_name SET TBLPROPERTIES('EXTERNAL'='TRUE');
+</codeblock>
+          (The part in parentheses is case sensitive.) Make sure to pick the 
right choice between internal and external when recreating the
+          table. See <xref href="impala_tables.xml#tables"/> for the 
differences between internal and external tables.
+        </note>
+
+        <p audience="Cloudera">
+          <b>Bug:</b> <xref 
href="https://issues.cloudera.org/browse/CDH-41605"; scope="external" 
format="html">CDH-41605</xref>
+        </p>
+
+        <p>
+          <b>Severity:</b> High
+        </p>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="IMP-469">
+
+<!-- Not part of Alex's spreadsheet. Perhaps it really is a permanent 
limitation and nobody is tracking it? -->
+
+      <title>Deviation from Hive behavior: Impala does not do implicit casts 
between string and numeric and boolean types.</title>
+
+      <conbody>
+
+        <p audience="Cloudera">
+          <b>Cloudera Bug:</b> <xref 
href="https://jira.cloudera.com/browse/IMP-469"; scope="external" 
format="html"/>; KI added 0.1
+          <i>Cloudera internal only</i>
+        </p>
+
+        <p>
+          <b>Anticipated Resolution</b>: None
+        </p>
+
+        <p>
+          <b>Workaround:</b> Use explicit casts.
+        </p>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="IMP-175">
+
+<!-- Not part of Alex's spreadsheet. Perhaps it really is a permanent 
limitation and nobody is tracking it? -->
+
+      <title>Deviation from Hive behavior: Out of range values float/double 
values are returned as maximum allowed value of type (Hive returns NULL)</title>
+
+      <conbody>
+
+        <p>
+          Impala behavior differs from Hive with respect to out of range 
float/double values. Out of range values are returned as maximum
+          allowed value of type (Hive returns NULL).
+        </p>
+
+        <p audience="Cloudera">
+          <b>Cloudera Bug:</b> <xref 
href="https://jira.cloudera.com/browse/IMP-175"; scope="external" 
format="html">IMPALA-175</xref> ; KI
+          added 0.1 <i>Cloudera internal only</i>
+        </p>
+
+        <p>
+          <b>Workaround:</b> None
+        </p>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="CDH-13199">
+
+<!-- Not part of Alex's spreadsheet. The CDH- prefix makes it an oddball. -->
+
+      <title>Configuration needed for Flume to be compatible with 
Impala</title>
+
+      <conbody>
+
+        <p>
+          For compatibility with Impala, the value for the Flume HDFS Sink 
<codeph>hdfs.writeFormat</codeph> must be set to
+          <codeph>Text</codeph>, rather than its default value of 
<codeph>Writable</codeph>. The <codeph>hdfs.writeFormat</codeph> setting
+          must be changed to <codeph>Text</codeph> before creating data files 
with Flume; otherwise, those files cannot be read by either
+          Impala or Hive.
+        </p>
+
+        <p>
+          <b>Resolution:</b> This information has been requested to be added 
to the upstream Flume documentation.
+        </p>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="IMPALA-635" rev="IMPALA-635">
+
+<!-- Not part of Alex's spreadsheet -->
+
+      <title>Avro Scanner fails to parse some schemas</title>
+
+      <conbody>
+
+        <p>
+          Querying certain Avro tables could cause a crash or return no rows, 
even though Impala could <codeph>DESCRIBE</codeph> the table.
+        </p>
+
+        <p>
+          <b>Bug:</b> <xref 
href="https://issues.cloudera.org/browse/IMPALA-635"; scope="external" 
format="html">IMPALA-635</xref>
+        </p>
+
+        <p>
+          <b>Workaround:</b> Swap the order of the fields in the schema 
specification. For example, <codeph>["null", "string"]</codeph>
+          instead of <codeph>["string", "null"]</codeph>.
+        </p>
+
+        <p>
+          <b>Resolution:</b> Not allowing this syntax agrees with the Avro 
specification, so it may still cause an error even when the
+          crashing issue is resolved.
+        </p>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="IMPALA-1024" rev="IMPALA-1024">
+
+<!-- Not part of Alex's spreadsheet -->
+
+      <title>Impala BE cannot parse Avro schema that contains a trailing 
semi-colon</title>
+
+      <conbody>
+
+        <p>
+          If an Avro table has a schema definition with a trailing semicolon, 
Impala encounters an error when the table is queried.
+        </p>
+
+        <p>
+          <b>Bug:</b> <xref 
href="https://issues.cloudera.org/browse/IMPALA-1024"; scope="external" 
format="html">IMPALA-1024</xref>
+        </p>
+
+        <p>
+          <b>Severity:</b> Remove trailing semicolon from the Avro schema.
+        </p>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="IMPALA-2154" rev="IMPALA-2154">
+
+<!-- Not part of Alex's spreadsheet -->
+
+      <title>Fix decompressor to allow parsing gzips with multiple 
streams</title>
+
+      <conbody>
+
+        <p>
+          Currently, Impala can only read gzipped files containing a single 
stream. If a gzipped file contains multiple concatenated
+          streams, the Impala query only processes the data from the first 
stream.
+        </p>
+
+        <p>
+          <b>Bug:</b> <xref 
href="https://issues.cloudera.org/browse/IMPALA-2154"; scope="external" 
format="html">IMPALA-2154</xref>
+        </p>
+
+        <p>
+          <b>Workaround:</b> Use a different gzip tool to compress file to a 
single stream file.
+        </p>
+
+        <p><b>Resolution:</b> Fixed in CDH 5.7.0 / Impala 2.5.0.</p>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="IMPALA-1578" rev="IMPALA-1578">
+
+<!-- Not part of Alex's spreadsheet -->
+
+      <title>Impala incorrectly handles text data when the new line character 
\n\r is split between different HDFS block</title>
+
+      <conbody>
+
+        <p>
+          If a carriage return / newline pair of characters in a text table is 
split between HDFS data blocks, Impala incorrectly processes
+          the row following the <codeph>\n\r</codeph> pair twice.
+        </p>
+
+        <p>
+          <b>Bug:</b> <xref 
href="https://issues.cloudera.org/browse/IMPALA-1578"; scope="external" 
format="html">IMPALA-1578</xref>
+        </p>
+
+        <p>
+          <b>Workaround:</b> Use the Parquet format for large volumes of data 
where practical.
+        </p>
+
+        <p><b>Resolution:</b> Fixed in CDH 5.8.0 / Impala 2.6.0.</p>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="IMPALA-1862" rev="IMPALA-1862">
+
+<!-- Not part of Alex's spreadsheet -->
+
+      <title>Invalid bool value not reported as a scanner error</title>
+
+      <conbody>
+
+        <p>
+          In some cases, an invalid <codeph>BOOLEAN</codeph> value read from a 
table does not produce a warning message about the bad value.
+          The result is still <codeph>NULL</codeph> as expected. Therefore, 
this is not a query correctness issue, but it could lead to
+          overlooking the presence of invalid data.
+        </p>
+
+        <p>
+          <b>Bug:</b> <xref 
href="https://issues.cloudera.org/browse/IMPALA-1862"; scope="external" 
format="html">IMPALA-1862</xref>
+        </p>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="IMPALA-1652" rev="IMPALA-1652">
+
+<!-- To do: Isn't this more a correctness issue? -->
+
+      <title>Incorrect results with basic predicate on CHAR typed 
column.</title>
+
+      <conbody>
+
+        <p>
+          When comparing a <codeph>CHAR</codeph> column value to a string 
literal, the literal value is not blank-padded and so the
+          comparison might fail when it should match.
+        </p>
+
+        <p>
+          <b>Bug:</b> <xref 
href="https://issues.cloudera.org/browse/IMPALA-1652"; scope="external" 
format="html">IMPALA-1652</xref>
+        </p>
+
+        <p>
+          <b>Workaround:</b> Use the <codeph>RPAD()</codeph> function to 
blank-pad literals compared with <codeph>CHAR</codeph> columns to
+          the expected length.
+        </p>
+
+      </conbody>
+
+    </concept>
+
+  </concept>
+
+  <concept id="known_issues_limitations">
+
+    <title>Impala Known Issues: Limitations</title>
+
+    <conbody>
+
+      <p>
+        These issues are current limitations of Impala that require evaluation 
as you plan how to integrate Impala into your data management
+        workflow.
+      </p>
+
+    </conbody>
+
+    <concept id="IMPALA-77" rev="IMPALA-77">
+
+<!-- Not part of Alex's spreadsheet. Perhaps it really is a permanent 
limitation and nobody is tracking it? -->
+
+      <title>Impala does not support running on clusters with federated 
namespaces</title>
+
+      <conbody>
+
+        <p>
+          Impala does not support running on clusters with federated 
namespaces. The <codeph>impalad</codeph> process will not start on a
+          node running such a filesystem based on the 
<codeph>org.apache.hadoop.fs.viewfs.ViewFs</codeph> class.
+        </p>
+
+        <p>
+          <b>Bug:</b> <xref 
href="https://issues.cloudera.org/browse/IMPALA-77"; scope="external" 
format="html">IMPALA-77</xref>
+        </p>
+
+        <p>
+          <b>Anticipated Resolution:</b> Limitation
+        </p>
+
+        <p>
+          <b>Workaround:</b> Use standard HDFS on all Impala nodes.
+        </p>
+
+      </conbody>
+
+    </concept>
+
+  </concept>
+
+  <concept id="known_issues_misc">
+
+    <title>Impala Known Issues: Miscellaneous / Older Issues</title>
+
+    <conbody>
+
+      <p>
+        These issues do not fall into one of the above categories or have not 
been categorized yet.
+      </p>
+
+    </conbody>
+
+    <concept id="IMPALA-2005" rev="IMPALA-2005">
+
+<!-- Not part of Alex's spreadsheet -->
+
+      <title>A failed CTAS does not drop the table if the insert fails.</title>
+
+      <conbody>
+
+        <p>
+          If a <codeph>CREATE TABLE AS SELECT</codeph> operation successfully 
creates the target table but an error occurs while querying
+          the source table or copying the data, the new table is left behind 
rather than being dropped.
+        </p>
+
+        <p>
+          <b>Bug:</b> <xref 
href="https://issues.cloudera.org/browse/IMPALA-2005"; scope="external" 
format="html">IMPALA-2005</xref>
+        </p>
+
+        <p>
+          <b>Workaround:</b> Drop the new table manually after a failed 
<codeph>CREATE TABLE AS SELECT</codeph>.
+        </p>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="IMPALA-1821" rev="IMPALA-1821">
+
+<!-- Not part of Alex's spreadsheet -->
+
+      <title>Casting scenarios with invalid/inconsistent results</title>
+
+      <conbody>
+
+        <p>
+          Using a <codeph>CAST()</codeph> function to convert large literal 
values to smaller types, or to convert special values such as
+          <codeph>NaN</codeph> or <codeph>Inf</codeph>, produces values not 
consistent with other database systems. This could lead to
+          unexpected results from queries.
+        </p>
+
+        <p>
+          <b>Bug:</b> <xref 
href="https://issues.cloudera.org/browse/IMPALA-1821"; scope="external" 
format="html">IMPALA-1821</xref>
+        </p>
+
+<!-- <p><b>Workaround:</b> Doublecheck that <codeph>CAST()</codeph> operations 
work as expect. The issue applies to expressions involving literals, not values 
read from table columns.</p> -->
+
+      </conbody>
+
+    </concept>
+
+    <concept id="IMPALA-1619" rev="IMPALA-1619">
+
+<!-- Not part of Alex's spreadsheet -->
+
+      <title>Support individual memory allocations larger than 1 GB</title>
+
+      <conbody>
+
+        <p>
+          The largest single block of memory that Impala can allocate during a 
query is 1 GiB. Therefore, a query could fail or Impala could
+          crash if a compressed text file resulted in more than 1 GiB of data 
in uncompressed form, or if a string function such as
+          <codeph>group_concat()</codeph> returned a value greater than 1 GiB.
+        </p>
+
+        <p>
+          <b>Bug:</b> <xref 
href="https://issues.cloudera.org/browse/IMPALA-1619"; scope="external" 
format="html">IMPALA-1619</xref>
+        </p>
+
+        <p><b>Resolution:</b> Fixed in CDH 5.9.0 / Impala 2.7.0 and CDH 5.8.3 
/ Impala 2.6.3.</p>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="IMPALA-941" rev="IMPALA-941">
+
+<!-- Not part of Alex's spreadsheet. Maybe this is interop? -->
+
+      <title>Impala Parser issue when using fully qualified table names that 
start with a number.</title>
+
+      <conbody>
+
+        <p>
+          A fully qualified table name starting with a number could cause a 
parsing error. In a name such as <codeph>db.571_market</codeph>,
+          the decimal point followed by digits is interpreted as a 
floating-point number.
+        </p>
+
+        <p>
+          <b>Bug:</b> <xref 
href="https://issues.cloudera.org/browse/IMPALA-941"; scope="external" 
format="html">IMPALA-941</xref>
+        </p>
+
+        <p>
+          <b>Workaround:</b> Surround each part of the fully qualified name 
with backticks (<codeph>``</codeph>).
+        </p>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="IMPALA-532" rev="IMPALA-532">
+
+<!-- Not part of Alex's spreadsheet. Perhaps it really is a permanent 
limitation and nobody is tracking it? -->
+
+      <title>Impala should tolerate bad locale settings</title>
+
+      <conbody>
+
+        <p>
+          If the <codeph>LC_*</codeph> environment variables specify an 
unsupported locale, Impala does not start.
+        </p>
+
+        <p>
+          <b>Bug:</b> <xref 
href="https://issues.cloudera.org/browse/IMPALA-532"; scope="external" 
format="html">IMPALA-532</xref>
+        </p>
+
+        <p>
+          <b>Workaround:</b> Add <codeph>LC_ALL="C"</codeph> to the 
environment settings for both the Impala daemon and the Statestore
+          daemon. See <xref href="impala_config_options.xml#config_options"/> 
for details about modifying these environment settings.
+        </p>
+
+        <p>
+          <b>Resolution:</b> Fixing this issue would require an upgrade to 
Boost 1.47 in the Impala distribution.
+        </p>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="IMP-1203">
+
+<!-- Not part of Alex's spreadsheet. Perhaps it really is a permanent 
limitation and nobody is tracking it? -->
+
+      <title>Log Level 3 Not Recommended for Impala</title>
+
+      <conbody>
+
+        <p>
+          The extensive logging produced by log level 3 can cause serious 
performance overhead and capacity issues.
+        </p>
+
+        <p>
+          <b>Workaround:</b> Reduce the log level to its default value of 1, 
that is, <codeph>GLOG_v=1</codeph>. See
+          <xref href="impala_logging.xml#log_levels"/> for details about the 
effects of setting different logging levels.
+        </p>
+
+      </conbody>
+
+    </concept>
+
+  </concept>
+
+</concept>


http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/bb88fdc0/docs/topics/impala_max_block_mgr_memory.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_max_block_mgr_memory.xml 
b/docs/topics/impala_max_block_mgr_memory.xml
new file mode 100644
index 0000000..3bf8ac8
--- /dev/null
+++ b/docs/topics/impala_max_block_mgr_memory.xml
@@ -0,0 +1,30 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept rev="2.1.0" id="max_block_mgr_memory">
+
+  <title>MAX_BLOCK_MGR_MEMORY</title>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="Impala Query Options"/>
+      <data name="Category" value="Memory"/>
+      <data name="Category" value="Developers"/>
+      <data name="Category" value="Data Analysts"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p rev="2.1.0">
+      <indexterm audience="Cloudera">MAX_BLOCK_MGR_MEMORY query 
option</indexterm>
+    </p>
+
+    <p></p>
+
+    <p>
+      <b>Default:</b>
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/added_in_20"/>
+  </conbody>
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/bb88fdc0/docs/topics/impala_max_num_runtime_filters.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_max_num_runtime_filters.xml 
b/docs/topics/impala_max_num_runtime_filters.xml
new file mode 100644
index 0000000..90e91dc
--- /dev/null
+++ b/docs/topics/impala_max_num_runtime_filters.xml
@@ -0,0 +1,61 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="max_num_runtime_filters" rev="2.5.0">
+
+  <title>MAX_NUM_RUNTIME_FILTERS Query Option (CDH 5.7 or higher only)</title>
+  <titlealts 
audience="PDF"><navtitle>MAX_NUM_RUNTIME_FILTERS</navtitle></titlealts>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="Impala Query Options"/>
+      <data name="Category" value="Performance"/>
+      <data name="Category" value="Developers"/>
+      <data name="Category" value="Data Analysts"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p rev="2.5.0">
+      <indexterm audience="Cloudera">MAX_NUM_RUNTIME_FILTERS query 
option</indexterm>
+      The <codeph>MAX_NUM_RUNTIME_FILTERS</codeph> query option
+      sets an upper limit on the number of runtime filters that can be 
produced for each query.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/type_integer"/>
+
+    <p>
+      <b>Default:</b> 10
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/added_in_250"/>
+
+    <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/>
+
+    <p>
+      Each runtime filter imposes some memory overhead on the query.
+      Depending on the setting of the 
<codeph>RUNTIME_BLOOM_FILTER_SIZE</codeph>
+      query option, each filter might consume between 1 and 16 megabytes
+      per plan fragment. There are typically 5 or fewer filters per plan 
fragment.
+    </p>
+
+    <p>
+      Impala evaluates the effectiveness of each filter, and keeps the
+      ones that eliminate the largest number of partitions or rows.
+      Therefore, this setting can protect against
+      potential problems due to excessive memory overhead for filter 
production,
+      while still allowing a high level of optimization for suitable queries.
+    </p>
+
+    <p 
conref="../shared/impala_common.xml#common/runtime_filtering_option_caveat"/>
+
+    <p conref="../shared/impala_common.xml#common/related_info"/>
+    <p>
+      <xref href="impala_runtime_filtering.xml"/>,
+      <!-- <xref href="impala_partitioning.xml#dynamic_partition_pruning"/>, 
-->
+      <xref 
href="impala_runtime_bloom_filter_size.xml#runtime_bloom_filter_size"/>,
+      <xref href="impala_runtime_filter_mode.xml#runtime_filter_mode"/>
+    </p>
+
+  </conbody>
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/bb88fdc0/docs/topics/impala_optimize_partition_key_scans.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_optimize_partition_key_scans.xml 
b/docs/topics/impala_optimize_partition_key_scans.xml
new file mode 100644
index 0000000..60635ff
--- /dev/null
+++ b/docs/topics/impala_optimize_partition_key_scans.xml
@@ -0,0 +1,180 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept rev="2.5.0 IMPALA-2499" id="optimize_partition_key_scans">
+
+  <title>OPTIMIZE_PARTITION_KEY_SCANS Query Option (CDH 5.7 or higher 
only)</title>
+  <titlealts 
audience="PDF"><navtitle>OPTIMIZE_PARTITION_KEY_SCANS</navtitle></titlealts>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="Impala Query Options"/>
+      <data name="Category" value="Querying"/>
+      <data name="Category" value="Performance"/>
+      <data name="Category" value="Developers"/>
+      <data name="Category" value="Data Analysts"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p rev="2.5.0 IMPALA-2499">
+      <indexterm audience="Cloudera">OPTIMIZE_PARTITION_KEY_SCANS query 
option</indexterm>
+      Enables a fast code path for queries that apply simple aggregate 
functions to partition key
+      columns: <codeph>MIN(<varname>key_column</varname>)</codeph>, 
<codeph>MAX(<varname>key_column</varname>)</codeph>,
+      or <codeph>COUNT(DISTINCT <varname>key_column</varname>)</codeph>.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/type_boolean"/>
+    <p conref="../shared/impala_common.xml#common/default_false_0"/>
+
+    <note conref="../shared/impala_common.xml#common/one_but_not_true"/>
+
+    <p conref="../shared/impala_common.xml#common/added_in_250"/>
+
+    <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/>
+
+    <p>
+      This optimization speeds up common <q>introspection</q> operations when 
using queries
+      to calculate the cardinality and range for partition key columns.
+    </p>
+
+    <p>
+      This optimization does not apply if the queries contain any 
<codeph>WHERE</codeph>,
+      <codeph>GROUP BY</codeph>, or <codeph>HAVING</codeph> clause. The 
relevant queries
+      should only compute the minimum, maximum, or number of distinct values 
for the
+      partition key columns across the whole table.
+    </p>
+
+    <p>
+      This optimization is enabled by a query option because it skips some 
consistency checks
+      and therefore can return slightly different partition values if 
partitions are in the
+      process of being added, dropped, or loaded outside of Impala. Queries 
might exhibit different
+      behavior depending on the setting of this option in the following cases:
+    </p>
+
+    <ul>
+      <li>
+        <p>
+          If files are removed from a partition using HDFS or other non-Impala 
operations,
+          there is a period until the next <codeph>REFRESH</codeph> of the 
table where regular
+          queries fail at run time because they detect the missing files. With 
this optimization
+          enabled, queries that evaluate only the partition key column values 
(not the contents of
+          the partition itself) succeed, and treat the partition as if it 
still exists.
+        </p>
+      </li>
+      <li>
+        <p>
+          If a partition contains any data files, but the data files do not 
contain any rows,
+          a regular query considers that the partition does not exist. With 
this optimization
+          enabled, the partition is treated as if it exists.
+        </p>
+        <p>
+          If the partition includes no files at all, this optimization does 
not change the query
+          behavior: the partition is considered to not exist whether or not 
this optimization is enabled.
+        </p>
+      </li>
+    </ul>
+
+    <p conref="../shared/impala_common.xml#common/example_blurb"/>
+
+    <p>
+      The following example shows initial schema setup and the default 
behavior of queries that
+      return just the partition key column for a table:
+    </p>
+
+<codeblock>
+-- Make a partitioned table with 3 partitions.
+create table t1 (s string) partitioned by (year int);
+insert into t1 partition (year=2015) values ('last year');
+insert into t1 partition (year=2016) values ('this year');
+insert into t1 partition (year=2017) values ('next year');
+
+-- Regardless of the option setting, this query must read the
+-- data files to know how many rows to return for each year value.
+explain select year from t1;
++-----------------------------------------------------+
+| Explain String                                      |
++-----------------------------------------------------+
+| Estimated Per-Host Requirements: Memory=0B VCores=0 |
+|                                                     |
+| F00:PLAN FRAGMENT [UNPARTITIONED]                   |
+|   00:SCAN HDFS [key_cols.t1]                        |
+|      partitions=3/3 files=4 size=40B                |
+|      table stats: 3 rows total                      |
+|      column stats: all                              |
+|      hosts=3 per-host-mem=unavailable               |
+|      tuple-ids=0 row-size=4B cardinality=3          |
++-----------------------------------------------------+
+
+-- The aggregation operation means the query does not need to read
+-- the data within each partition: the result set contains exactly 1 row
+-- per partition, derived from the partition key column value.
+-- By default, Impala still includes a 'scan' operation in the query.
+explain select distinct year from t1;
++------------------------------------------------------------------------------------+
+| Explain String                                                               
      |
++------------------------------------------------------------------------------------+
+| Estimated Per-Host Requirements: Memory=0B VCores=0                          
      |
+|                                                                              
      |
+| 01:AGGREGATE [FINALIZE]                                                      
      |
+| |  group by: year                                                            
      |
+| |                                                                            
      |
+| 00:SCAN HDFS [key_cols.t1]                                                   
      |
+|    partitions=0/0 files=0 size=0B                                            
      |
++------------------------------------------------------------------------------------+
+</codeblock>
+
+    <p>
+      The following examples show how the plan is made more efficient when the
+      <codeph>OPTIMIZE_PARTITION_KEY_SCANS</codeph> option is enabled:
+    </p>
+
+<codeblock>
+set optimize_partition_key_scans=1;
+OPTIMIZE_PARTITION_KEY_SCANS set to 1
+
+-- The aggregation operation is turned into a UNION internally,
+-- with constant values known in advance based on the metadata
+-- for the partitioned table.
+explain select distinct year from t1;
++-----------------------------------------------------+
+| Explain String                                      |
++-----------------------------------------------------+
+| Estimated Per-Host Requirements: Memory=0B VCores=0 |
+|                                                     |
+| F00:PLAN FRAGMENT [UNPARTITIONED]                   |
+|   01:AGGREGATE [FINALIZE]                           |
+|   |  group by: year                                 |
+|   |  hosts=1 per-host-mem=unavailable               |
+|   |  tuple-ids=1 row-size=4B cardinality=3          |
+|   |                                                 |
+|   00:UNION                                          |
+|      constant-operands=3                            |
+|      hosts=1 per-host-mem=unavailable               |
+|      tuple-ids=0 row-size=4B cardinality=3          |
++-----------------------------------------------------+
+
+-- The same optimization applies to other aggregation queries
+-- that only return values based on partition key columns:
+-- MIN, MAX, COUNT(DISTINCT), and so on.
+explain select min(year) from t1;
++-----------------------------------------------------+
+| Explain String                                      |
++-----------------------------------------------------+
+| Estimated Per-Host Requirements: Memory=0B VCores=0 |
+|                                                     |
+| F00:PLAN FRAGMENT [UNPARTITIONED]                   |
+|   01:AGGREGATE [FINALIZE]                           |
+|   |  output: min(year)                              |
+|   |  hosts=1 per-host-mem=unavailable               |
+|   |  tuple-ids=1 row-size=4B cardinality=1          |
+|   |                                                 |
+|   00:UNION                                          |
+|      constant-operands=3                            |
+|      hosts=1 per-host-mem=unavailable               |
+|      tuple-ids=0 row-size=4B cardinality=3          |
++-----------------------------------------------------+
+</codeblock>
+
+  </conbody>
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/bb88fdc0/docs/topics/impala_parquet_annotate_strings_utf8.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_parquet_annotate_strings_utf8.xml 
b/docs/topics/impala_parquet_annotate_strings_utf8.xml
new file mode 100644
index 0000000..cd5b578
--- /dev/null
+++ b/docs/topics/impala_parquet_annotate_strings_utf8.xml
@@ -0,0 +1,50 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="parquet_annotate_strings_utf8" rev="2.6.0 IMPALA-2069">
+
+  <title>PARQUET_ANNOTATE_STRINGS_UTF8 Query Option (CDH 5.8 or higher 
only)</title>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="Impala Query Options"/>
+      <data name="Category" value="Parquet"/>
+      <data name="Category" value="Developers"/>
+      <data name="Category" value="Data Analysts"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p rev="2.6.0 IMPALA-2069">
+      <indexterm audience="Cloudera">PARQUET_ANNOTATE_STRINGS_UTF8 query 
option</indexterm>
+      Causes Impala <codeph>INSERT</codeph> and <codeph>CREATE TABLE AS 
SELECT</codeph> statements
+      to write Parquet files that use the UTF-8 annotation for 
<codeph>STRING</codeph> columns.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/>
+    <p>
+      By default, Impala represents a <codeph>STRING</codeph> column in 
Parquet as an unannotated binary field.
+    </p>
+    <p>
+      Impala always uses the UTF-8 annotation when writing 
<codeph>CHAR</codeph> and <codeph>VARCHAR</codeph>
+      columns to Parquet files. An alternative to using the query option is to 
cast <codeph>STRING</codeph>
+      values to <codeph>VARCHAR</codeph>.
+    </p>
+    <p>
+      This option is to help make Impala-written data more interoperable with 
other data processing engines.
+      Impala itself currently does not support all operations on UTF-8 data.
+      Although data processed by Impala is typically represented in ASCII, it 
is valid to designate the
+      data as UTF-8 when storing on disk, because ASCII is a subset of UTF-8.
+    </p>
+    <p conref="../shared/impala_common.xml#common/type_boolean"/>
+    <p conref="../shared/impala_common.xml#common/default_false_0"/>
+
+    <p conref="../shared/impala_common.xml#common/added_in_260"/>
+
+    <p conref="../shared/impala_common.xml#common/related_info"/>
+    <p>
+      <xref href="impala_parquet.xml#parquet"/>
+    </p>
+
+  </conbody>
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/bb88fdc0/docs/topics/impala_parquet_fallback_schema_resolution.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_parquet_fallback_schema_resolution.xml 
b/docs/topics/impala_parquet_fallback_schema_resolution.xml
new file mode 100644
index 0000000..06b1a28
--- /dev/null
+++ b/docs/topics/impala_parquet_fallback_schema_resolution.xml
@@ -0,0 +1,49 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="parquet_fallback_schema_resolution" rev="2.6.0 IMPALA-2835 
CDH-33330">
+
+  <title>PARQUET_FALLBACK_SCHEMA_RESOLUTION Query Option (CDH 5.8 or higher 
only)</title>
+  <titlealts 
audience="PDF"><navtitle>PARQUET_FALLBACK_SCHEMA_RESOLUTION</navtitle></titlealts>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="Impala Query Options"/>
+      <data name="Category" value="Parquet"/>
+      <data name="Category" value="Schemas"/>
+      <data name="Category" value="Developers"/>
+      <data name="Category" value="Data Analysts"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p rev="2.6.0 IMPALA-2835 CDH-33330">
+      <indexterm audience="Cloudera">PARQUET_FALLBACK_SCHEMA_RESOLUTION query 
option</indexterm>
+      Allows Impala to look up columns within Parquet files by column name, 
rather than column order,
+      when necessary.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/>
+    <p>
+      By default, Impala looks up columns within a Parquet file based on
+      the order of columns in the table.
+      The <codeph>name</codeph> setting for this option enables behavior for 
Impala queries
+      similar to the Hive setting 
<codeph>parquet.column.index.access=false</codeph>.
+      It also allows Impala to query Parquet files created by Hive with the
+      <codeph>parquet.column.index.access=false</codeph> setting in effect.
+    </p>
+
+    <p>
+      <b>Type:</b> integer or string.
+      Allowed values are 0 or <codeph>position</codeph> (default), 1 or 
<codeph>name</codeph>.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/added_in_260"/>
+
+    <p conref="../shared/impala_common.xml#common/related_info"/>
+    <p>
+      <xref href="impala_parquet.xml#parquet_schema_evolution"/>
+    </p>
+
+  </conbody>
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/bb88fdc0/docs/topics/impala_perf_ddl.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_perf_ddl.xml b/docs/topics/impala_perf_ddl.xml
new file mode 100644
index 0000000..d075cd2
--- /dev/null
+++ b/docs/topics/impala_perf_ddl.xml
@@ -0,0 +1,42 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="perf_ddl">
+
+  <title>Performance Considerations for DDL Statements</title>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Performance"/>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="DDL"/>
+      <data name="Category" value="SQL"/>
+      <data name="Category" value="Developers"/>
+      <data name="Category" value="Data Analysts"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      These tips and guidelines apply to the Impala DDL statements, which are 
listed in
+      <xref href="impala_ddl.xml#ddl"/>.
+    </p>
+
+    <p>
+      Because Impala DDL statements operate on the metastore database, the 
performance considerations for those
+      statements are totally different than for distributed queries that 
operate on HDFS
+      <ph rev="2.2.0">or S3</ph> data files, or on HBase tables.
+    </p>
+
+    <p>
+      Each DDL statement makes a relatively small update to the metastore 
database. The overhead for each statement
+      is proportional to the overall number of Impala and Hive tables, and 
(for a partitioned table) to the overall
+      number of partitions in that table. Issuing large numbers of DDL 
statements (such as one for each table or
+      one for each partition) also has the potential to encounter a bottleneck 
with access to the metastore
+      database. Therefore, for efficient DDL, try to design your application 
logic and ETL pipeline to avoid a huge
+      number of tables and a huge number of partitions within each table. In 
this context, <q>huge</q> is in the
+      range of tens of thousands or hundreds of thousands.
+    </p>
+
+    <note 
conref="../shared/impala_common.xml#common/add_partition_set_location"/>
+  </conbody>
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/bb88fdc0/docs/topics/impala_prefetch_mode.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_prefetch_mode.xml 
b/docs/topics/impala_prefetch_mode.xml
new file mode 100644
index 0000000..30dd116
--- /dev/null
+++ b/docs/topics/impala_prefetch_mode.xml
@@ -0,0 +1,49 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="prefetch_mode" rev="2.6.0 IMPALA-3286">
+
+  <title>PREFETCH_MODE Query Option (CDH 5.8 or higher only)</title>
+  <titlealts audience="PDF"><navtitle>PREFETCH_MODE</navtitle></titlealts>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="Impala Query Options"/>
+      <data name="Category" value="Performance"/>
+      <data name="Category" value="Developers"/>
+      <data name="Category" value="Data Analysts"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p rev="2.6.0 IMPALA-3286">
+      <indexterm audience="Cloudera">PREFETCH_MODE query option</indexterm>
+      Determines whether the prefetching optimization is applied during
+      join query processing.
+    </p>
+
+    <p>
+      <b>Type:</b> numeric (0, 1)
+      or corresponding mnemonic strings (<codeph>NONE</codeph>, 
<codeph>HT_BUCKET</codeph>).
+    </p>
+
+    <p>
+      <b>Default:</b> 1 (equivalent to <codeph>HT_BUCKET</codeph>)
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/added_in_260"/>
+
+    <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/>
+    <p>
+      The default mode is 1, which means that hash table buckets are
+      prefetched during join query processing.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/related_info"/>
+    <p>
+      <xref href="impala_joins.xml#joins"/>,
+      <xref href="impala_perf_joins.xml#perf_joins"/>.
+    </p>
+
+  </conbody>
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/bb88fdc0/docs/topics/impala_query_lifetime.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_query_lifetime.xml 
b/docs/topics/impala_query_lifetime.xml
new file mode 100644
index 0000000..2f46d21
--- /dev/null
+++ b/docs/topics/impala_query_lifetime.xml
@@ -0,0 +1,31 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="query_lifetime">
+
+  <title>Impala Query Lifetime</title>
+
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="Concepts"/>
+      <data name="Category" value="Querying"/>
+      <data name="Category" value="Developers"/>
+      <data name="Category" value="Data Analysts"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      Impala queries progress through a series of stages from the time they 
are initiated to the time
+      they are completed. A query can also be cancelled before it is entirely 
finished, either
+      because of an explicit cancellation, or because of a timeout, 
out-of-memory, or other error condition.
+      Understanding the query lifecycle can help you manage the throughput and 
resource usage of Impala
+      queries, especially in a high-concurrency or multi-workload environment.
+    </p>
+
+    <p outputclass="toc"/>
+  </conbody>
+
+
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/bb88fdc0/docs/topics/impala_relnotes.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_relnotes.xml b/docs/topics/impala_relnotes.xml
new file mode 100644
index 0000000..5c53a21
--- /dev/null
+++ b/docs/topics/impala_relnotes.xml
@@ -0,0 +1,34 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="relnotes" audience="standalone">
+
+  <title>Impala Release Notes</title>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="Release Notes"/>
+      <data name="Category" value="Administrators"/>
+      <data name="Category" value="Developers"/>
+      <data name="Category" value="Data Analysts"/>
+    </metadata>
+  </prolog>
+
+  <conbody id="relnotes_intro">
+
+    <p>
+      These release notes provide information on the <xref 
href="impala_new_features.xml#new_features">new
+      features</xref> and <xref 
href="impala_known_issues.xml#known_issues">known issues and limitations</xref> 
for
+      Impala versions up to <ph 
conref="../shared/ImpalaVariables.xml#impala_vars/ReleaseVersion"/>. For users
+      upgrading from earlier Impala releases, or using Impala in combination 
with specific versions of other
+      Cloudera software, <xref 
href="impala_incompatible_changes.xml#incompatible_changes"/> lists any changes 
to
+      file formats, SQL syntax, or software dependencies to take into account.
+    </p>
+
+    <p>
+      Once you are finished reviewing these release notes, for more 
information about using Impala, see
+      <xref audience="integrated" href="impala.xml"/><xref 
audience="standalone" 
href="http://www.cloudera.com/documentation/enterprise/latest/topics/impala.html";
 scope="external" format="html"/>.
+    </p>
+
+    <p outputclass="toc"/>
+  </conbody>
+</concept>

[2/6] incubator-impala git commit: Add files that weren't needed during initial build testing of SQL Reference.

Reply via email to