[22/51] [partial] incubator-impala git commit: IMPALA-3398: Add docs to main Impala branch.

jbapple Thu, 17 Nov 2016 15:12:39 -0800

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3be0f122/docs/topics/impala_known_issues.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_known_issues.xml 
b/docs/topics/impala_known_issues.xml
new file mode 100644
index 0000000..e57ec62
--- /dev/null
+++ b/docs/topics/impala_known_issues.xml
@@ -0,0 +1,1812 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept rev="ver" id="known_issues">
+
+  <title><ph audience="standalone">Known Issues and Workarounds in 
Impala</ph><ph audience="integrated">Apache Impala (incubating) Known 
Issues</ph></title>
+
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="Release Notes"/>
+      <data name="Category" value="Known Issues"/>
+      <data name="Category" value="Troubleshooting"/>
+      <data name="Category" value="Upgrading"/>
+      <data name="Category" value="Administrators"/>
+      <data name="Category" value="Developers"/>
+      <data name="Category" value="Data Analysts"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      The following sections describe known issues and workarounds in Impala, 
as of the current production release. This page summarizes the
+      most serious or frequently encountered issues in the current release, to 
help you make planning decisions about installing and
+      upgrading. Any workarounds are listed here. The bug links take you to 
the Impala issues site, where you can see the diagnosis and
+      whether a fix is in the pipeline.
+    </p>
+
+    <note>
+      The online issue tracking system for Impala contains comprehensive 
information and is updated in real time. To verify whether an issue
+      you are experiencing has already been reported, or which release an 
issue is fixed in, search on the
+      <xref href="https://issues.cloudera.org/"; scope="external" 
format="html">issues.cloudera.org JIRA tracker</xref>.
+    </note>
+
+    <p outputclass="toc inpage"/>
+
+    <p>
+      For issues fixed in various Impala releases, see <xref 
href="impala_fixed_issues.xml#fixed_issues"/>.
+    </p>
+
+<!-- Use as a template for new issues.
+    <concept id="">
+      <title></title>
+      <conbody>
+        <p>
+        </p>
+        <p><b>Bug:</b> <xref href="https://issues.cloudera.org/browse/"; 
scope="external" format="html"></xref></p>
+        <p><b>Severity:</b> High</p>
+        <p><b>Resolution:</b> </p>
+        <p><b>Workaround:</b> </p>
+      </conbody>
+    </concept>
+
+-->
+
+  </conbody>
+
+<!-- New known issues for CDH 5.5 / Impala 2.3.
+
+Title: Server-to-server SSL and Kerberos do not work together
+Description: If server<->server SSL is enabled (with 
ssl_client_ca_certificate), and Kerberos auth is used between servers, the 
cluster will fail to start.
+Upstream & Internal JIRAs: https://issues.cloudera.org/browse/IMPALA-2598
+Severity: Medium.  Server-to-server SSL is practically unusable but this is a 
new feature.
+Workaround: No known workaround.
+
+Title: Queries may hang on server-to-server exchange errors
+Description: The DataStreamSender::Channel::CloseInternal() does not close the 
channel on an error. This will cause the node on the other side of the channel 
to wait indefinitely causing a hang.
+Upstream & Internal JIRAs: https://issues.cloudera.org/browse/IMPALA-2592
+Severity: Low.  This does not occur frequently.
+Workaround: No known workaround.
+
+Title: Catalogd may crash when loading metadata for tables with many 
partitions, many columns and with incremental stats
+Description: Incremental stats use up about 400 bytes per partition X column.  
So for a table with 20K partitions and 100 columns this is about 800 MB.  When 
serialized this goes past the 2 GB Java array size limit and leads to a catalog 
crash.
+Upstream & Internal JIRAs: https://issues.cloudera.org/browse/IMPALA-2648, 
IMPALA-2647, IMPALA-2649.
+Severity: Low.  This does not occur frequently.
+Workaround:  Reduce the number of partitions.
+
+More from: 
https://issues.cloudera.org/browse/IMPALA-2093?filter=11278&jql=project%20%3D%20IMPALA%20AND%20priority%20in%20(blocker%2C%20critical)%20AND%20status%20in%20(open%2C%20Reopened)%20AND%20labels%20%3D%20correctness%20ORDER%20BY%20priority%20DESC
+
+IMPALA-2093
+Wrong plan of NOT IN aggregate subquery when a constant is used in subquery 
predicate
+IMPALA-1652
+Incorrect results with basic predicate on CHAR typed column.
+IMPALA-1459
+Incorrect assignment of predicates through an outer join in an inline view.
+IMPALA-2665
+Incorrect assignment of On-clause predicate inside inline view with an outer 
join.
+IMPALA-2603
+Crash: impala::Coordinator::ValidateCollectionSlots
+IMPALA-2375
+Fix issues with the legacy join and agg nodes using 
enable_partitioned_hash_join=false and enable_partitioned_aggregation=false
+IMPALA-1862
+Invalid bool value not reported as a scanner error
+IMPALA-1792
+ImpalaODBC: Can not get the value in the SQLGetData(m-x th column) after the 
SQLBindCol(m th column)
+IMPALA-1578
+Impala incorrectly handles text data when the new line character \n\r is split 
between different HDFS block
+IMPALA-2643
+Duplicated column in inline view causes dropping null slots during scan
+IMPALA-2005
+A failed CTAS does not drop the table if the insert fails.
+IMPALA-1821
+Casting scenarios with invalid/inconsistent results
+
+Another list from Alex, of correctness problems with predicates; might overlap 
with ones I already have:
+
+https://issues.cloudera.org/browse/IMPALA-2665 - Already have
+https://issues.cloudera.org/browse/IMPALA-2643 - Already have
+https://issues.cloudera.org/browse/IMPALA-1459 - Already have
+https://issues.cloudera.org/browse/IMPALA-2144 - Don't have
+
+-->
+
+  <concept id="known_issues_crash">
+
+    <title>Impala Known Issues: Crashes and Hangs</title>
+
+    <conbody>
+
+      <p>
+        These issues can cause Impala to quit or become unresponsive.
+      </p>
+
+    </conbody>
+
+    <concept id="IMPALA-3069" rev="IMPALA-3069">
+
+      <title>Setting BATCH_SIZE query option too large can cause a 
crash</title>
+
+      <conbody>
+
+        <p>
+          Using a value in the millions for the <codeph>BATCH_SIZE</codeph> 
query option, together with wide rows or large string values in
+          columns, could cause a memory allocation of more than 2 GB resulting 
in a crash.
+        </p>
+
+        <p>
+          <b>Bug:</b> <xref 
href="https://issues.cloudera.org/browse/IMPALA-3069"; scope="external" 
format="html">IMPALA-3069</xref>
+        </p>
+
+        <p>
+          <b>Severity:</b> High
+        </p>
+
+        <p><b>Resolution:</b> Fixed in CDH 5.9.0 / Impala 2.7.0.</p>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="IMPALA-3441" rev="IMPALA-3441">
+
+      <title></title>
+
+      <conbody>
+
+        <p>
+          Malformed Avro data, such as out-of-bounds integers or values in the 
wrong format, could cause a crash when queried.
+        </p>
+
+        <p>
+          <b>Bug:</b> <xref 
href="https://issues.cloudera.org/browse/IMPALA-3441"; scope="external" 
format="html">IMPALA-3441</xref>
+        </p>
+
+        <p>
+          <b>Severity:</b> High
+        </p>
+
+        <p><b>Resolution:</b> Fixed in CDH 5.9.0 / Impala 2.7.0 and CDH 5.8.2 
/ Impala 2.6.2.</p>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="IMPALA-2592" rev="IMPALA-2592">
+
+      <title>Queries may hang on server-to-server exchange errors</title>
+
+      <conbody>
+
+        <p>
+          The <codeph>DataStreamSender::Channel::CloseInternal()</codeph> does 
not close the channel on an error. This causes the node on
+          the other side of the channel to wait indefinitely, causing a hang.
+        </p>
+
+        <p>
+          <b>Bug:</b> <xref 
href="https://issues.cloudera.org/browse/IMPALA-2592"; scope="external" 
format="html">IMPALA-2592</xref>
+        </p>
+
+        <p>
+          <b>Resolution:</b> Fixed in CDH 5.7.0 / Impala 2.5.0.
+        </p>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="IMPALA-2365" rev="IMPALA-2365">
+
+      <title>Impalad is crashing if udf jar is not available in hdfs location 
for first time</title>
+
+      <conbody>
+
+        <p>
+          If the JAR file corresponding to a Java UDF is removed from HDFS 
after the Impala <codeph>CREATE FUNCTION</codeph> statement is
+          issued, the <cmdname>impalad</cmdname> daemon crashes.
+        </p>
+
+        <p>
+          <b>Bug:</b> <xref 
href="https://issues.cloudera.org/browse/IMPALA-2365"; scope="external" 
format="html">IMPALA-2365</xref>
+        </p>
+
+        <p><b>Resolution:</b> Fixed in CDH 5.7.0 / Impala 2.5.0.</p>
+
+      </conbody>
+
+    </concept>
+
+  </concept>
+
+  <concept id="known_issues_performance">
+
+    <title id="ki_performance">Impala Known Issues: Performance</title>
+
+    <conbody>
+
+      <p>
+        These issues involve the performance of operations such as queries or 
DDL statements.
+      </p>
+
+    </conbody>
+
+    <concept id="IMPALA-1480" rev="IMPALA-1480">
+
+<!-- Not part of Alex's spreadsheet. Spreadsheet has IMPALA-1423 which 
mentions it's similar to this one but not a duplicate. -->
+
+      <title>Slow DDL statements for tables with large number of 
partitions</title>
+
+      <conbody>
+
+        <p>
+          DDL statements for tables with a large number of partitions might be 
slow.
+        </p>
+
+        <p>
+          <b>Bug:</b> <xref 
href="https://issues.cloudera.org/browse/IMPALA-1480"; scope="external" 
format="html"></xref>IMPALA-1480
+        </p>
+
+        <p>
+          <b>Workaround:</b> Run the DDL statement in Hive if the slowness is 
an issue.
+        </p>
+
+        <p><b>Resolution:</b> Fixed in CDH 5.7.0 / Impala 2.5.0.</p>
+
+      </conbody>
+
+    </concept>
+
+  </concept>
+
+  <concept id="known_issues_usability">
+
+    <title id="ki_usability">Impala Known Issues: Usability</title>
+
+    <conbody>
+
+      <p>
+        These issues affect the convenience of interacting directly with 
Impala, typically through the Impala shell or Hue.
+      </p>
+
+    </conbody>
+
+    <concept id="IMPALA-3133" rev="IMPALA-3133">
+
+      <title>Unexpected privileges in show output</title>
+
+      <conbody>
+
+        <p>
+          Due to a timing condition in updating cached policy data from 
Sentry, the <codeph>SHOW</codeph> statements for Sentry roles could
+          sometimes display out-of-date role settings. Because Impala rechecks 
authorization for each SQL statement, this discrepancy does
+          not represent a security issue for other statements.
+        </p>
+
+        <p>
+          <b>Bug:</b> <xref 
href="https://issues.cloudera.org/browse/IMPALA-3133"; scope="external" 
format="html">IMPALA-3133</xref>
+        </p>
+
+        <p>
+          <b>Severity:</b> High
+        </p>
+
+        <p>
+          <b>Resolution:</b> Fixes have been issued for some but not all CDH / 
Impala releases. Check the JIRA for details of fix releases.
+        </p>
+
+        <p><b>Resolution:</b> Fixed in CDH 5.8.0 / Impala 2.6.0 and CDH 5.7.1 
/ Impala 2.5.1.</p>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="IMPALA-1776" rev="IMPALA-1776">
+
+      <title>Less than 100% progress on completed simple SELECT queries</title>
+
+      <conbody>
+
+        <p>
+          Simple <codeph>SELECT</codeph> queries show less than 100% progress 
even though they are already completed.
+        </p>
+
+        <p>
+          <b>Bug:</b> <xref 
href="https://issues.cloudera.org/browse/IMPALA-1776"; scope="external" 
format="html">IMPALA-1776</xref>
+        </p>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="concept_lmx_dk5_lx">
+
+      <title>Unexpected column overflow behavior with INT datatypes</title>
+
+      <conbody>
+
+        <p conref="../shared/impala_common.xml#common/int_overflow_behavior" />
+
+        <p>
+          <b>Bug:</b>
+          <xref href="https://issues.cloudera.org/browse/IMPALA-3123";
+            scope="external" format="html">IMPALA-3123</xref>
+        </p>
+
+      </conbody>
+
+    </concept>
+
+  </concept>
+
+  <concept id="known_issues_drivers">
+
+    <title id="ki_drivers">Impala Known Issues: JDBC and ODBC Drivers</title>
+
+    <conbody>
+
+      <p>
+        These issues affect applications that use the JDBC or ODBC APIs, such 
as business intelligence tools or custom-written applications
+        in languages such as Java or C++.
+      </p>
+
+    </conbody>
+
+    <concept id="IMPALA-1792" rev="IMPALA-1792">
+
+<!-- Not part of Alex's spreadsheet -->
+
+      <title>ImpalaODBC: Can not get the value in the SQLGetData(m-x th 
column) after the SQLBindCol(m th column)</title>
+
+      <conbody>
+
+        <p>
+          If the ODBC <codeph>SQLGetData</codeph> is called on a series of 
columns, the function calls must follow the same order as the
+          columns. For example, if data is fetched from column 2 then column 
1, the <codeph>SQLGetData</codeph> call for column 1 returns
+          <codeph>NULL</codeph>.
+        </p>
+
+        <p>
+          <b>Bug:</b> <xref 
href="https://issues.cloudera.org/browse/IMPALA-1792"; scope="external" 
format="html">IMPALA-1792</xref>
+        </p>
+
+        <p>
+          <b>Workaround:</b> Fetch columns in the same order they are defined 
in the table.
+        </p>
+
+      </conbody>
+
+    </concept>
+
+  </concept>
+
+  <concept id="known_issues_security">
+
+    <title id="ki_security">Impala Known Issues: Security</title>
+
+    <conbody>
+
+      <p>
+        These issues relate to security features, such as Kerberos 
authentication, Sentry authorization, encryption, auditing, and
+        redaction.
+      </p>
+
+    </conbody>
+
+<!-- To do: Hiding for the moment. https://jira.cloudera.com/browse/CDH-38736 
reports the issue is fixed. -->
+
+    <concept id="impala-shell_ssl_dependency" audience="Cloudera" 
rev="impala-shell_ssl_dependency">
+
+      <title>impala-shell requires Python with ssl module</title>
+
+      <conbody>
+
+        <p>
+          On CentOS 5.10 and Oracle Linux 5.11 using the built-in Python 2.4, 
invoking the <cmdname>impala-shell</cmdname> with the
+          <codeph>--ssl</codeph> option might fail with the following error:
+        </p>
+
+<codeblock>
+Unable to import the python 'ssl' module. It is required for an SSL-secured 
connection.
+</codeblock>
+
+<!-- No associated IMPALA-* JIRA... It is the internal JIRA CDH-38736. -->
+
+        <p>
+          <b>Severity:</b> Low, workaround available
+        </p>
+
+        <p>
+          <b>Resolution:</b> Customers are less likely to experience this 
issue over time, because <codeph>ssl</codeph> module is included
+          in newer Python releases packaged with recent Linux releases.
+        </p>
+
+        <p>
+          <b>Workaround:</b> To use SSL with <cmdname>impala-shell</cmdname> 
on these platform versions, install the <codeph>ssh</codeph>
+          Python module:
+        </p>
+
+<codeblock>
+yum install python-ssl
+</codeblock>
+
+        <p>
+          Then <cmdname>impala-shell</cmdname> can run when using SSL. For 
example:
+        </p>
+
+<codeblock>
+impala-shell -s impala --ssl --ca_cert /path_to_truststore/truststore.pem
+</codeblock>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="renewable_kerberos_tickets">
+
+<!-- Not part of Alex's spreadsheet. Not associated with a JIRA number AFAIK. 
-->
+
+      <title>Kerberos tickets must be renewable</title>
+
+      <conbody>
+
+        <p>
+          In a Kerberos environment, the <cmdname>impalad</cmdname> daemon 
might not start if Kerberos tickets are not renewable.
+        </p>
+
+        <p>
+          <b>Workaround:</b> Configure your KDC to allow tickets to be 
renewed, and configure <filepath>krb5.conf</filepath> to request
+          renewable tickets.
+        </p>
+
+      </conbody>
+
+    </concept>
+
+<!-- To do: Fixed in 2.5.0, 2.3.2. Commenting out until I see how it can fix 
into "known issues now fixed" convention.
+     That set of fix releases looks incomplete so probably have to do some 
detective work with the JIRA.
+     https://issues.cloudera.org/browse/IMPALA-2598
+    <concept id="IMPALA-2598" rev="IMPALA-2598">
+
+      <title>Server-to-server SSL and Kerberos do not work together</title>
+
+      <conbody>
+
+        <p>
+          If SSL is enabled between internal Impala components (with 
<codeph>ssl_client_ca_certificate</codeph>), and Kerberos
+          authentication is used between servers, the cluster fails to start.
+        </p>
+
+        <p>
+          <b>Bug:</b> <xref 
href="https://issues.cloudera.org/browse/IMPALA-2598"; scope="external" 
format="html">IMPALA-2598</xref>
+        </p>
+
+        <p>
+          <b>Workaround:</b> Do not use the new 
<codeph>ssl_client_ca_certificate</codeph> setting on Kerberos-enabled clusters 
until this
+          issue is resolved.
+        </p>
+
+        <p><b>Resolution:</b> Fixed in CDH 5.7.0 / Impala 2.5.0 and CDH 5.5.2 
/ Impala 2.3.2.</p>
+
+      </conbody>
+
+    </concept>
+-->
+
+  </concept>
+
+<!--
+  <concept id="known_issues_supportability">
+
+    <title id="ki_supportability">Impala Known Issues: Supportability</title>
+
+    <conbody>
+
+      <p>
+        These issues affect the ability to debug and troubleshoot Impala, such 
as incorrect output in query profiles or the query state
+        shown in monitoring applications.
+      </p>
+
+    </conbody>
+
+  </concept>
+-->
+
+  <concept id="known_issues_resources">
+
+    <title id="ki_resources">Impala Known Issues: Resources</title>
+
+    <conbody>
+
+      <p>
+        These issues involve memory or disk usage, including out-of-memory 
conditions, the spill-to-disk feature, and resource management
+        features.
+      </p>
+
+    </conbody>
+
+    <concept id="TSB-168">
+
+      <title>Impala catalogd heap issues when upgrading to 5.7</title>
+
+      <conbody>
+
+        <p>
+          The default heap size for Impala <cmdname>catalogd</cmdname> has 
changed in <keyword keyref="impala25_full"/> and higher:
+        </p>
+
+        <ul>
+          <li>
+            <p>
+              Before 5.7, by default <cmdname>catalogd</cmdname> was using the 
JVM's default heap size, which is the smaller of 1/4th of the
+              physical memory or 32 GB.
+            </p>
+          </li>
+
+          <li>
+            <p>
+              Starting with CDH 5.7.0, the default <cmdname>catalogd</cmdname> 
heap size is 4 GB.
+            </p>
+          </li>
+        </ul>
+
+        <p>
+          For example, on a host with 128GB physical memory this will result 
in catalogd heap decreasing from 32GB to 4GB. This can result
+          in out-of-memory errors in catalogd and leading to query failures.
+        </p>
+
+        <p audience="Cloudera">
+          <b>Bug:</b> <xref href="https://jira.cloudera.com/browse/TSB-168"; 
scope="external" format="html">TSB-168</xref>
+        </p>
+
+        <p>
+          <b>Severity:</b> High
+        </p>
+
+        <p>
+          <b>Workaround:</b> Increase the <cmdname>catalogd</cmdname> memory 
limit as follows.
+<!-- See <xref href="impala_scalability.xml#scalability_catalog"/> for the 
procedure. -->
+<!-- Including full details here via conref, for benefit of PDF readers or 
anyone else
+             who might have trouble seeing or following the link. -->
+        </p>
+
+        <p 
conref="../shared/impala_common.xml#common/increase_catalogd_heap_size"/>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="IMPALA-3509" rev="IMPALA-3509">
+
+      <title>Breakpad minidumps can be very large when the thread count is 
high</title>
+
+      <conbody>
+
+        <p>
+          The size of the breakpad minidump files grows linearly with the 
number of threads. By default, each thread adds 8 KB to the
+          minidump size. Minidump files could consume significant disk space 
when the daemons have a high number of threads.
+        </p>
+
+        <p>
+          <b>Bug:</b> <xref 
href="https://issues.cloudera.org/browse/IMPALA-3509"; scope="external" 
format="html">IMPALA-3509</xref>
+        </p>
+
+        <p>
+          <b>Severity:</b> High
+        </p>
+
+        <p>
+          <b>Workaround:</b> Add 
<codeph>--minidump_size_limit_hint_kb=<varname>size</varname></codeph> to set a 
soft upper limit on the
+          size of each minidump file. If the minidump file would exceed that 
limit, Impala reduces the amount of information for each thread
+          from 8 KB to 2 KB. (Full thread information is captured for the 
first 20 threads, then 2 KB per thread after that.) The minidump
+          file can still grow larger than the <q>hinted</q> size. For example, 
if you have 10,000 threads, the minidump file can be more
+          than 20 MB.
+        </p>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="IMPALA-3662" rev="IMPALA-3662">
+
+      <title>Parquet scanner memory increase after IMPALA-2736</title>
+
+      <conbody>
+
+        <p>
+          The initial release of <keyword keyref="impala26_full"/> sometimes 
has a higher peak memory usage than in previous releases while reading
+          Parquet files.
+        </p>
+
+        <p>
+          <keyword keyref="impala26_full"/> addresses the issue IMPALA-2736, 
which improves the efficiency of Parquet scans by up to 2x. The faster scans
+          may result in a higher peak memory consumption compared to earlier 
versions of Impala due to the new column-wise row
+          materialization strategy. You are likely to experience higher memory 
consumption in any of the following scenarios:
+          <ul>
+            <li>
+              <p>
+                Very wide rows due to projecting many columns in a scan.
+              </p>
+            </li>
+
+            <li>
+              <p>
+                Very large rows due to big column values, for example, long 
strings or nested collections with many items.
+              </p>
+            </li>
+
+            <li>
+              <p>
+                Producer/consumer speed imbalances, leading to more rows being 
buffered between a scan (producer) and downstream (consumer)
+                plan nodes.
+              </p>
+            </li>
+          </ul>
+        </p>
+
+        <p>
+          <b>Bug:</b> <xref 
href="https://issues.cloudera.org/browse/IMPALA-3662"; scope="external" 
format="html">IMPALA-3662</xref>
+        </p>
+
+        <p>
+          <b>Severity:</b> High
+        </p>
+
+        <p>
+          <b>Workaround:</b> The following query options might help to reduce 
memory consumption in the Parquet scanner:
+          <ul>
+            <li>
+              Reduce the number of scanner threads, for example: <codeph>set 
num_scanner_threads=30</codeph>
+            </li>
+
+            <li>
+              Reduce the batch size, for example: <codeph>set 
batch_size=512</codeph>
+            </li>
+
+            <li>
+              Increase the memory limit, for example: <codeph>set 
mem_limit=64g</codeph>
+            </li>
+          </ul>
+        </p>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="IMPALA-691" rev="IMPALA-691">
+
+      <title>Process mem limit does not account for the JVM's memory 
usage</title>
+
+<!-- Supposed to be resolved for Impala 2.3.0. -->
+
+      <conbody>
+
+        <p>
+          Some memory allocated by the JVM used internally by Impala is not 
counted against the memory limit for the
+          <cmdname>impalad</cmdname> daemon.
+        </p>
+
+        <p>
+          <b>Bug:</b> <xref 
href="https://issues.cloudera.org/browse/IMPALA-691"; scope="external" 
format="html">IMPALA-691</xref>
+        </p>
+
+        <p>
+          <b>Workaround:</b> To monitor overall memory usage, use the 
<cmdname>top</cmdname> command, or add the memory figures in the
+          Impala web UI <uicontrol>/memz</uicontrol> tab to JVM memory usage 
shown on the <uicontrol>/metrics</uicontrol> tab.
+        </p>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="IMPALA-2375" rev="IMPALA-2375">
+
+<!-- Not part of Alex's spreadsheet -->
+
+      <title>Fix issues with the legacy join and agg nodes using 
--enable_partitioned_hash_join=false and 
--enable_partitioned_aggregation=false</title>
+
+      <conbody>
+
+        <p></p>
+
+        <p>
+          <b>Bug:</b> <xref 
href="https://issues.cloudera.org/browse/IMPALA-2375"; scope="external" 
format="html">IMPALA-2375</xref>
+        </p>
+
+        <p>
+          <b>Workaround:</b> Transition away from the <q>old-style</q> join 
and aggregation mechanism if practical.
+        </p>
+
+        <p><b>Resolution:</b> Fixed in CDH 5.7.0 / Impala 2.5.0.</p>
+
+      </conbody>
+
+    </concept>
+
+  </concept>
+
+  <concept id="known_issues_correctness">
+
+    <title id="ki_correctness">Impala Known Issues: Correctness</title>
+
+    <conbody>
+
+      <p>
+        These issues can cause incorrect or unexpected results from queries. 
They typically only arise in very specific circumstances.
+      </p>
+
+    </conbody>
+
+    <concept id="IMPALA-3084" rev="IMPALA-3084">
+
+      <title>Incorrect assignment of NULL checking predicate through an outer 
join of a nested collection.</title>
+
+      <conbody>
+
+        <p>
+          A query could return wrong results (too many or too few 
<codeph>NULL</codeph> values) if it referenced an outer-joined nested
+          collection and also contained a null-checking predicate (<codeph>IS 
NULL</codeph>, <codeph>IS NOT NULL</codeph>, or the
+          <codeph>&lt;=&gt;</codeph> operator) in the <codeph>WHERE</codeph> 
clause.
+        </p>
+
+        <p>
+          <b>Bug:</b> <xref 
href="https://issues.cloudera.org/browse/IMPALA-3084"; scope="external" 
format="html">IMPALA-3084</xref>
+        </p>
+
+        <p>
+          <b>Severity:</b> High
+        </p>
+
+        <p><b>Resolution:</b> Fixed in CDH 5.9.0 / Impala 2.7.0.</p>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="IMPALA-3094" rev="IMPALA-3094">
+
+      <title>Incorrect result due to constant evaluation in query with outer 
join</title>
+
+      <conbody>
+
+        <p>
+          An <codeph>OUTER JOIN</codeph> query could omit some expected result 
rows due to a constant such as <codeph>FALSE</codeph> in
+          another join clause. For example:
+        </p>
+
+<codeblock><![CDATA[
+explain SELECT 1 FROM alltypestiny a1
+  INNER JOIN alltypesagg a2 ON a1.smallint_col = a2.year AND false
+  RIGHT JOIN alltypes a3 ON a1.year = a1.bigint_col;
++---------------------------------------------------------+
+| Explain String                                          |
++---------------------------------------------------------+
+| Estimated Per-Host Requirements: Memory=1.00KB VCores=1 |
+|                                                         |
+| 00:EMPTYSET                                             |
++---------------------------------------------------------+
+]]>
+</codeblock>
+
+        <p>
+          <b>Bug:</b> <xref 
href="https://issues.cloudera.org/browse/IMPALA-3094"; scope="external" 
format="html">IMPALA-3094</xref>
+        </p>
+
+        <p>
+          <b>Severity:</b> High
+        </p>
+
+        <p>
+          <b>Resolution:</b>
+        </p>
+
+        <p>
+          <b>Workaround:</b>
+        </p>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="IMPALA-3126" rev="IMPALA-3126">
+
+      <title>Incorrect assignment of an inner join On-clause predicate through 
an outer join.</title>
+
+      <conbody>
+
+        <p>
+          Impala may return incorrect results for queries that have the 
following properties:
+        </p>
+
+        <ul>
+          <li>
+            <p>
+              There is an INNER JOIN following a series of OUTER JOINs.
+            </p>
+          </li>
+
+          <li>
+            <p>
+              The INNER JOIN has an On-clause with a predicate that references 
at least two tables that are on the nullable side of the
+              preceding OUTER JOINs.
+            </p>
+          </li>
+        </ul>
+
+        <p>
+          The following query demonstrates the issue:
+        </p>
+
+<codeblock>
+select 1 from functional.alltypes a left outer join
+  functional.alltypes b on a.id = b.id left outer join
+  functional.alltypes c on b.id = c.id right outer join
+  functional.alltypes d on c.id = d.id inner join functional.alltypes e
+on b.int_col = c.int_col;
+</codeblock>
+
+        <p>
+          The following listing shows the incorrect <codeph>EXPLAIN</codeph> 
plan:
+        </p>
+
+<codeblock><![CDATA[
++-----------------------------------------------------------+
+| Explain String                                            |
++-----------------------------------------------------------+
+| Estimated Per-Host Requirements: Memory=480.04MB VCores=4 |
+|                                                           |
+| 14:EXCHANGE [UNPARTITIONED]                               |
+| |                                                         |
+| 08:NESTED LOOP JOIN [CROSS JOIN, BROADCAST]               |
+| |                                                         |
+| |--13:EXCHANGE [BROADCAST]                                |
+| |  |                                                      |
+| |  04:SCAN HDFS [functional.alltypes e]                   |
+| |     partitions=24/24 files=24 size=478.45KB             |
+| |                                                         |
+| 07:HASH JOIN [RIGHT OUTER JOIN, PARTITIONED]              |
+| |  hash predicates: c.id = d.id                           |
+| |  runtime filters: RF000 <- d.id                         |
+| |                                                         |
+| |--12:EXCHANGE [HASH(d.id)]                               |
+| |  |                                                      |
+| |  03:SCAN HDFS [functional.alltypes d]                   |
+| |     partitions=24/24 files=24 size=478.45KB             |
+| |                                                         |
+| 06:HASH JOIN [LEFT OUTER JOIN, PARTITIONED]               |
+| |  hash predicates: b.id = c.id                           |
+| |  other predicates: b.int_col = c.int_col     <--- incorrect placement; 
should be at node 07 or 08
+| |  runtime filters: RF001 <- c.int_col                    |
+| |                                                         |
+| |--11:EXCHANGE [HASH(c.id)]                               |
+| |  |                                                      |
+| |  02:SCAN HDFS [functional.alltypes c]                   |
+| |     partitions=24/24 files=24 size=478.45KB             |
+| |     runtime filters: RF000 -> c.id                      |
+| |                                                         |
+| 05:HASH JOIN [RIGHT OUTER JOIN, PARTITIONED]              |
+| |  hash predicates: b.id = a.id                           |
+| |  runtime filters: RF002 <- a.id                         |
+| |                                                         |
+| |--10:EXCHANGE [HASH(a.id)]                               |
+| |  |                                                      |
+| |  00:SCAN HDFS [functional.alltypes a]                   |
+| |     partitions=24/24 files=24 size=478.45KB             |
+| |                                                         |
+| 09:EXCHANGE [HASH(b.id)]                                  |
+| |                                                         |
+| 01:SCAN HDFS [functional.alltypes b]                      |
+|    partitions=24/24 files=24 size=478.45KB                |
+|    runtime filters: RF001 -> b.int_col, RF002 -> b.id     |
++-----------------------------------------------------------+
+]]>
+</codeblock>
+
+        <p>
+          <b>Bug:</b> <xref 
href="https://issues.cloudera.org/browse/IMPALA-3126"; scope="external" 
format="html">IMPALA-3126</xref>
+        </p>
+
+        <p>
+          <b>Severity:</b> High
+        </p>
+
+        <p>
+          <b>Workaround:</b> High
+        </p>
+
+        <p>
+          For some queries, this problem can be worked around by placing the 
problematic <codeph>ON</codeph> clause predicate in the
+          <codeph>WHERE</codeph> clause instead, or changing the preceding 
<codeph>OUTER JOIN</codeph>s to <codeph>INNER JOIN</codeph>s (if
+          the <codeph>ON</codeph> clause predicate would discard 
<codeph>NULL</codeph>s). For example, to fix the problematic query above:
+        </p>
+
+<codeblock><![CDATA[
+select 1 from functional.alltypes a
+  left outer join functional.alltypes b
+    on a.id = b.id
+  left outer join functional.alltypes c
+    on b.id = c.id
+  right outer join functional.alltypes d
+    on c.id = d.id
+  inner join functional.alltypes e
+where b.int_col = c.int_col
+
++-----------------------------------------------------------+
+| Explain String                                            |
++-----------------------------------------------------------+
+| Estimated Per-Host Requirements: Memory=480.04MB VCores=4 |
+|                                                           |
+| 14:EXCHANGE [UNPARTITIONED]                               |
+| |                                                         |
+| 08:NESTED LOOP JOIN [CROSS JOIN, BROADCAST]               |
+| |                                                         |
+| |--13:EXCHANGE [BROADCAST]                                |
+| |  |                                                      |
+| |  04:SCAN HDFS [functional.alltypes e]                   |
+| |     partitions=24/24 files=24 size=478.45KB             |
+| |                                                         |
+| 07:HASH JOIN [RIGHT OUTER JOIN, PARTITIONED]              |
+| |  hash predicates: c.id = d.id                           |
+| |  other predicates: b.int_col = c.int_col          <-- correct assignment
+| |  runtime filters: RF000 <- d.id                         |
+| |                                                         |
+| |--12:EXCHANGE [HASH(d.id)]                               |
+| |  |                                                      |
+| |  03:SCAN HDFS [functional.alltypes d]                   |
+| |     partitions=24/24 files=24 size=478.45KB             |
+| |                                                         |
+| 06:HASH JOIN [LEFT OUTER JOIN, PARTITIONED]               |
+| |  hash predicates: b.id = c.id                           |
+| |                                                         |
+| |--11:EXCHANGE [HASH(c.id)]                               |
+| |  |                                                      |
+| |  02:SCAN HDFS [functional.alltypes c]                   |
+| |     partitions=24/24 files=24 size=478.45KB             |
+| |     runtime filters: RF000 -> c.id                      |
+| |                                                         |
+| 05:HASH JOIN [RIGHT OUTER JOIN, PARTITIONED]              |
+| |  hash predicates: b.id = a.id                           |
+| |  runtime filters: RF001 <- a.id                         |
+| |                                                         |
+| |--10:EXCHANGE [HASH(a.id)]                               |
+| |  |                                                      |
+| |  00:SCAN HDFS [functional.alltypes a]                   |
+| |     partitions=24/24 files=24 size=478.45KB             |
+| |                                                         |
+| 09:EXCHANGE [HASH(b.id)]                                  |
+| |                                                         |
+| 01:SCAN HDFS [functional.alltypes b]                      |
+|    partitions=24/24 files=24 size=478.45KB                |
+|    runtime filters: RF001 -> b.id                         |
++-----------------------------------------------------------+
+]]>
+</codeblock>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="IMPALA-3006" rev="IMPALA-3006">
+
+      <title>Impala may use incorrect bit order with BIT_PACKED 
encoding</title>
+
+      <conbody>
+
+        <p>
+          Parquet <codeph>BIT_PACKED</codeph> encoding as implemented by 
Impala is LSB first. The parquet standard says it is MSB first.
+        </p>
+
+        <p>
+          <b>Bug:</b> <xref 
href="https://issues.cloudera.org/browse/IMPALA-3006"; scope="external" 
format="html">IMPALA-3006</xref>
+        </p>
+
+        <p>
+          <b>Severity:</b> High, but rare in practice because BIT_PACKED is 
infrequently used, is not written by Impala, and is deprecated
+          in Parquet 2.0.
+        </p>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="IMPALA-3082" rev="IMPALA-3082">
+
+      <title>BST between 1972 and 1995</title>
+
+      <conbody>
+
+        <p>
+          The calculation of start and end times for the BST (British Summer 
Time) time zone could be incorrect between 1972 and 1995.
+          Between 1972 and 1995, BST began and ended at 02:00 GMT on the third 
Sunday in March (or second Sunday when Easter fell on the
+          third) and fourth Sunday in October. For example, both function 
calls should return 13, but actually return 12, in a query such
+          as:
+        </p>
+
+<codeblock>
+select
+  extract(from_utc_timestamp(cast('1970-01-01 12:00:00' as timestamp), 
'Europe/London'), "hour") summer70start,
+  extract(from_utc_timestamp(cast('1970-12-31 12:00:00' as timestamp), 
'Europe/London'), "hour") summer70end;
+</codeblock>
+
+        <p>
+          <b>Bug:</b> <xref 
href="https://issues.cloudera.org/browse/IMPALA-3082"; scope="external" 
format="html">IMPALA-3082</xref>
+        </p>
+
+        <p>
+          <b>Severity:</b> High
+        </p>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="IMPALA-1170" rev="IMPALA-1170">
+
+      <title>parse_url() returns incorrect result if @ character in URL</title>
+
+      <conbody>
+
+        <p>
+          If a URL contains an <codeph>@</codeph> character, the 
<codeph>parse_url()</codeph> function could return an incorrect value for
+          the hostname field.
+        </p>
+
+        <p>
+          <b>Bug:</b> <xref 
href="https://issues.cloudera.org/browse/IMPALA-1170"; scope="external" 
format="html"></xref>IMPALA-1170
+        </p>
+
+        <p><b>Resolution:</b> Fixed in CDH 5.7.0 / Impala 2.5.0 and CDH 5.5.4 
/ Impala 2.3.4.</p>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="IMPALA-2422" rev="IMPALA-2422">
+
+      <title>% escaping does not work correctly when occurs at the end in a 
LIKE clause</title>
+
+      <conbody>
+
+        <p>
+          If the final character in the RHS argument of a 
<codeph>LIKE</codeph> operator is an escaped <codeph>\%</codeph> character, it
+          does not match a <codeph>%</codeph> final character of the LHS 
argument.
+        </p>
+
+        <p>
+          <b>Bug:</b> <xref 
href="https://issues.cloudera.org/browse/IMPALA-2422"; scope="external" 
format="html">IMPALA-2422</xref>
+        </p>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="IMPALA-397" rev="IMPALA-397">
+
+      <title>ORDER BY rand() does not work.</title>
+
+      <conbody>
+
+        <p>
+          Because the value for <codeph>rand()</codeph> is computed early in a 
query, using an <codeph>ORDER BY</codeph> expression
+          involving a call to <codeph>rand()</codeph> does not actually 
randomize the results.
+        </p>
+
+        <p>
+          <b>Bug:</b> <xref 
href="https://issues.cloudera.org/browse/IMPALA-397"; scope="external" 
format="html">IMPALA-397</xref>
+        </p>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="IMPALA-2643" rev="IMPALA-2643">
+
+      <title>Duplicated column in inline view causes dropping null slots 
during scan</title>
+
+      <conbody>
+
+        <p>
+          If the same column is queried twice within a view, 
<codeph>NULL</codeph> values for that column are omitted. For example, the
+          result of <codeph>COUNT(*)</codeph> on the view could be less than 
expected.
+        </p>
+
+        <p>
+          <b>Bug:</b> <xref 
href="https://issues.cloudera.org/browse/IMPALA-2643"; scope="external" 
format="html">IMPALA-2643</xref>
+        </p>
+
+        <p>
+          <b>Workaround:</b> Avoid selecting the same column twice within an 
inline view.
+        </p>
+
+        <p><b>Resolution:</b> Fixed in CDH 5.7.0 / Impala 2.5.0, CDH 5.5.2 / 
Impala 2.3.2, and CDH 5.4.10 / Impala 2.2.10.</p>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="IMPALA-1459" rev="IMPALA-1459">
+
+<!-- Not part of Alex's spreadsheet -->
+
+      <title>Incorrect assignment of predicates through an outer join in an 
inline view.</title>
+
+      <conbody>
+
+        <p>
+          A query involving an <codeph>OUTER JOIN</codeph> clause where one of 
the table references is an inline view might apply predicates
+          from the <codeph>ON</codeph> clause incorrectly.
+        </p>
+
+        <p>
+          <b>Bug:</b> <xref 
href="https://issues.cloudera.org/browse/IMPALA-1459"; scope="external" 
format="html">IMPALA-1459</xref>
+        </p>
+
+        <p><b>Resolution:</b> Fixed in CDH 5.7.0 / Impala 2.5.0, CDH 5.5.2 / 
Impala 2.3.2, and CDH 5.4.9 / Impala 2.2.9.</p>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="IMPALA-2603" rev="IMPALA-2603">
+
+      <title>Crash: impala::Coordinator::ValidateCollectionSlots</title>
+
+      <conbody>
+
+        <p>
+          A query could encounter a serious error if includes multiple nested 
levels of <codeph>INNER JOIN</codeph> clauses involving
+          subqueries.
+        </p>
+
+        <p>
+          <b>Bug:</b> <xref 
href="https://issues.cloudera.org/browse/IMPALA-2603"; scope="external" 
format="html">IMPALA-2603</xref>
+        </p>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="IMPALA-2665" rev="IMPALA-2665">
+
+      <title>Incorrect assignment of On-clause predicate inside inline view 
with an outer join.</title>
+
+      <conbody>
+
+        <p>
+          A query might return incorrect results due to wrong predicate 
assignment in the following scenario:
+        </p>
+
+        <ol>
+          <li>
+            There is an inline view that contains an outer join
+          </li>
+
+          <li>
+            That inline view is joined with another table in the enclosing 
query block
+          </li>
+
+          <li>
+            That join has an On-clause containing a predicate that only 
references columns originating from the outer-joined tables inside
+            the inline view
+          </li>
+        </ol>
+
+        <p>
+          <b>Bug:</b> <xref 
href="https://issues.cloudera.org/browse/IMPALA-2665"; scope="external" 
format="html">IMPALA-2665</xref>
+        </p>
+
+        <p><b>Resolution:</b> Fixed in CDH 5.7.0 / Impala 2.5.0, CDH 5.5.2 / 
Impala 2.3.2, and CDH 5.4.9 / Impala 2.2.9.</p>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="IMPALA-2144" rev="IMPALA-2144">
+
+      <title>Wrong assignment of having clause predicate across outer 
join</title>
+
+      <conbody>
+
+        <p>
+          In an <codeph>OUTER JOIN</codeph> query with a 
<codeph>HAVING</codeph> clause, the comparison from the <codeph>HAVING</codeph>
+          clause might be applied at the wrong stage of query processing, 
leading to incorrect results.
+        </p>
+
+        <p>
+          <b>Bug:</b> <xref 
href="https://issues.cloudera.org/browse/IMPALA-2144"; scope="external" 
format="html">IMPALA-2144</xref>
+        </p>
+
+        <p><b>Resolution:</b> Fixed in CDH 5.7.0 / Impala 2.5.0.</p>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="IMPALA-2093" rev="IMPALA-2093">
+
+      <title>Wrong plan of NOT IN aggregate subquery when a constant is used 
in subquery predicate</title>
+
+      <conbody>
+
+        <p>
+          A <codeph>NOT IN</codeph> operator with a subquery that calls an 
aggregate function, such as <codeph>NOT IN (SELECT
+          SUM(...))</codeph>, could return incorrect results.
+        </p>
+
+        <p>
+          <b>Bug:</b> <xref 
href="https://issues.cloudera.org/browse/IMPALA-2093"; scope="external" 
format="html">IMPALA-2093</xref>
+        </p>
+
+        <p><b>Resolution:</b> Fixed in CDH 5.7.0 / Impala 2.5.0 and CDH 5.5.4 
/ Impala 2.3.4.</p>
+
+      </conbody>
+
+    </concept>
+
+  </concept>
+
+  <concept id="known_issues_metadata">
+
+    <title id="ki_metadata">Impala Known Issues: Metadata</title>
+
+    <conbody>
+
+      <p>
+        These issues affect how Impala interacts with metadata. They cover 
areas such as the metastore database, the <codeph>COMPUTE
+        STATS</codeph> statement, and the Impala <cmdname>catalogd</cmdname> 
daemon.
+      </p>
+
+    </conbody>
+
+    <concept id="IMPALA-2648" rev="IMPALA-2648">
+
+      <title>Catalogd may crash when loading metadata for tables with many 
partitions, many columns and with incremental stats</title>
+
+      <conbody>
+
+        <p>
+          Incremental stats use up about 400 bytes per partition for each 
column. For example, for a table with 20K partitions and 100
+          columns, the memory overhead from incremental statistics is about 
800 MB. When serialized for transmission across the network,
+          this metadata exceeds the 2 GB Java array size limit and leads to a 
<codeph>catalogd</codeph> crash.
+        </p>
+
+        <p>
+          <b>Bugs:</b> <xref 
href="https://issues.cloudera.org/browse/IMPALA-2647"; scope="external" 
format="html">IMPALA-2647</xref>,
+          <xref href="https://issues.cloudera.org/browse/IMPALA-2648"; 
scope="external" format="html">IMPALA-2648</xref>,
+          <xref href="https://issues.cloudera.org/browse/IMPALA-2649"; 
scope="external" format="html">IMPALA-2649</xref>
+        </p>
+
+        <p>
+          <b>Workaround:</b> If feasible, compute full stats periodically and 
avoid computing incremental stats for that table. The
+          scalability of incremental stats computation is a continuing work 
item.
+        </p>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="IMPALA-1420" rev="IMPALA-1420 2.0.0">
+
+<!-- Not part of Alex's spreadsheet -->
+
+      <title>Can't update stats manually via alter table after upgrading to 
CDH 5.2</title>
+
+      <conbody>
+
+        <p></p>
+
+        <p>
+          <b>Bug:</b> <xref 
href="https://issues.cloudera.org/browse/IMPALA-1420"; scope="external" 
format="html">IMPALA-1420</xref>
+        </p>
+
+        <p>
+          <b>Workaround:</b> On CDH 5.2, when adjusting table statistics 
manually by setting the <codeph>numRows</codeph>, you must also
+          enable the Boolean property 
<codeph>STATS_GENERATED_VIA_STATS_TASK</codeph>. For example, use a statement 
like the following to
+          set both properties with a single <codeph>ALTER TABLE</codeph> 
statement:
+        </p>
+
+<codeblock>ALTER TABLE <varname>table_name</varname> SET 
TBLPROPERTIES('numRows'='<varname>new_value</varname>', 
'STATS_GENERATED_VIA_STATS_TASK' = 'true');</codeblock>
+
+        <p>
+          <b>Resolution:</b> The underlying cause is the issue
+          <xref href="https://issues.apache.org/jira/browse/HIVE-8648"; 
scope="external" format="html">HIVE-8648</xref> that affects the
+          metastore in Hive 0.13. The workaround is only needed until the fix 
for this issue is incorporated into a CDH release.
+        </p>
+
+      </conbody>
+
+    </concept>
+
+  </concept>
+
+  <concept id="known_issues_interop">
+
+    <title id="ki_interop">Impala Known Issues: Interoperability</title>
+
+    <conbody>
+
+      <p>
+        These issues affect the ability to interchange data between Impala and 
other database systems. They cover areas such as data types
+        and file formats.
+      </p>
+
+    </conbody>
+
+<!-- Opened based on CDH-41605. Not part of Alex's spreadsheet AFAIK. -->
+
+    <concept id="CDH-41605">
+
+      <title>DESCRIBE FORMATTED gives error on Avro table</title>
+
+      <conbody>
+
+        <p>
+          This issue can occur either on old Avro tables (created prior to 
Hive 1.1 / CDH 5.4) or when changing the Avro schema file by
+          adding or removing columns. Columns added to the schema file will 
not show up in the output of the <codeph>DESCRIBE
+          FORMATTED</codeph> command. Removing columns from the schema file 
will trigger a <codeph>NullPointerException</codeph>.
+        </p>
+
+        <p>
+          As a workaround, you can use the output of <codeph>SHOW CREATE 
TABLE</codeph> to drop and recreate the table. This will populate
+          the Hive metastore database with the correct column definitions.
+        </p>
+
+        <note type="warning">
+          Only use this for external tables, or Impala will remove the data 
files. In case of an internal table, set it to external first:
+<codeblock>
+ALTER TABLE table_name SET TBLPROPERTIES('EXTERNAL'='TRUE');
+</codeblock>
+          (The part in parentheses is case sensitive.) Make sure to pick the 
right choice between internal and external when recreating the
+          table. See <xref href="impala_tables.xml#tables"/> for the 
differences between internal and external tables.
+        </note>
+
+        <p audience="Cloudera">
+          <b>Bug:</b> <xref 
href="https://issues.cloudera.org/browse/CDH-41605"; scope="external" 
format="html">CDH-41605</xref>
+        </p>
+
+        <p>
+          <b>Severity:</b> High
+        </p>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="IMP-469">
+
+<!-- Not part of Alex's spreadsheet. Perhaps it really is a permanent 
limitation and nobody is tracking it? -->
+
+      <title>Deviation from Hive behavior: Impala does not do implicit casts 
between string and numeric and boolean types.</title>
+
+      <conbody>
+
+        <p audience="Cloudera">
+          <b>Cloudera Bug:</b> <xref 
href="https://jira.cloudera.com/browse/IMP-469"; scope="external" 
format="html"/>; KI added 0.1
+          <i>Cloudera internal only</i>
+        </p>
+
+        <p>
+          <b>Anticipated Resolution</b>: None
+        </p>
+
+        <p>
+          <b>Workaround:</b> Use explicit casts.
+        </p>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="IMP-175">
+
+<!-- Not part of Alex's spreadsheet. Perhaps it really is a permanent 
limitation and nobody is tracking it? -->
+
+      <title>Deviation from Hive behavior: Out of range values float/double 
values are returned as maximum allowed value of type (Hive returns NULL)</title>
+
+      <conbody>
+
+        <p>
+          Impala behavior differs from Hive with respect to out of range 
float/double values. Out of range values are returned as maximum
+          allowed value of type (Hive returns NULL).
+        </p>
+
+        <p audience="Cloudera">
+          <b>Cloudera Bug:</b> <xref 
href="https://jira.cloudera.com/browse/IMP-175"; scope="external" 
format="html">IMPALA-175</xref> ; KI
+          added 0.1 <i>Cloudera internal only</i>
+        </p>
+
+        <p>
+          <b>Workaround:</b> None
+        </p>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="CDH-13199">
+
+<!-- Not part of Alex's spreadsheet. The CDH- prefix makes it an oddball. -->
+
+      <title>Configuration needed for Flume to be compatible with 
Impala</title>
+
+      <conbody>
+
+        <p>
+          For compatibility with Impala, the value for the Flume HDFS Sink 
<codeph>hdfs.writeFormat</codeph> must be set to
+          <codeph>Text</codeph>, rather than its default value of 
<codeph>Writable</codeph>. The <codeph>hdfs.writeFormat</codeph> setting
+          must be changed to <codeph>Text</codeph> before creating data files 
with Flume; otherwise, those files cannot be read by either
+          Impala or Hive.
+        </p>
+
+        <p>
+          <b>Resolution:</b> This information has been requested to be added 
to the upstream Flume documentation.
+        </p>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="IMPALA-635" rev="IMPALA-635">
+
+<!-- Not part of Alex's spreadsheet -->
+
+      <title>Avro Scanner fails to parse some schemas</title>
+
+      <conbody>
+
+        <p>
+          Querying certain Avro tables could cause a crash or return no rows, 
even though Impala could <codeph>DESCRIBE</codeph> the table.
+        </p>
+
+        <p>
+          <b>Bug:</b> <xref 
href="https://issues.cloudera.org/browse/IMPALA-635"; scope="external" 
format="html">IMPALA-635</xref>
+        </p>
+
+        <p>
+          <b>Workaround:</b> Swap the order of the fields in the schema 
specification. For example, <codeph>["null", "string"]</codeph>
+          instead of <codeph>["string", "null"]</codeph>.
+        </p>
+
+        <p>
+          <b>Resolution:</b> Not allowing this syntax agrees with the Avro 
specification, so it may still cause an error even when the
+          crashing issue is resolved.
+        </p>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="IMPALA-1024" rev="IMPALA-1024">
+
+<!-- Not part of Alex's spreadsheet -->
+
+      <title>Impala BE cannot parse Avro schema that contains a trailing 
semi-colon</title>
+
+      <conbody>
+
+        <p>
+          If an Avro table has a schema definition with a trailing semicolon, 
Impala encounters an error when the table is queried.
+        </p>
+
+        <p>
+          <b>Bug:</b> <xref 
href="https://issues.cloudera.org/browse/IMPALA-1024"; scope="external" 
format="html">IMPALA-1024</xref>
+        </p>
+
+        <p>
+          <b>Severity:</b> Remove trailing semicolon from the Avro schema.
+        </p>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="IMPALA-2154" rev="IMPALA-2154">
+
+<!-- Not part of Alex's spreadsheet -->
+
+      <title>Fix decompressor to allow parsing gzips with multiple 
streams</title>
+
+      <conbody>
+
+        <p>
+          Currently, Impala can only read gzipped files containing a single 
stream. If a gzipped file contains multiple concatenated
+          streams, the Impala query only processes the data from the first 
stream.
+        </p>
+
+        <p>
+          <b>Bug:</b> <xref 
href="https://issues.cloudera.org/browse/IMPALA-2154"; scope="external" 
format="html">IMPALA-2154</xref>
+        </p>
+
+        <p>
+          <b>Workaround:</b> Use a different gzip tool to compress file to a 
single stream file.
+        </p>
+
+        <p><b>Resolution:</b> Fixed in CDH 5.7.0 / Impala 2.5.0.</p>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="IMPALA-1578" rev="IMPALA-1578">
+
+<!-- Not part of Alex's spreadsheet -->
+
+      <title>Impala incorrectly handles text data when the new line character 
\n\r is split between different HDFS block</title>
+
+      <conbody>
+
+        <p>
+          If a carriage return / newline pair of characters in a text table is 
split between HDFS data blocks, Impala incorrectly processes
+          the row following the <codeph>\n\r</codeph> pair twice.
+        </p>
+
+        <p>
+          <b>Bug:</b> <xref 
href="https://issues.cloudera.org/browse/IMPALA-1578"; scope="external" 
format="html">IMPALA-1578</xref>
+        </p>
+
+        <p>
+          <b>Workaround:</b> Use the Parquet format for large volumes of data 
where practical.
+        </p>
+
+        <p><b>Resolution:</b> Fixed in CDH 5.8.0 / Impala 2.6.0.</p>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="IMPALA-1862" rev="IMPALA-1862">
+
+<!-- Not part of Alex's spreadsheet -->
+
+      <title>Invalid bool value not reported as a scanner error</title>
+
+      <conbody>
+
+        <p>
+          In some cases, an invalid <codeph>BOOLEAN</codeph> value read from a 
table does not produce a warning message about the bad value.
+          The result is still <codeph>NULL</codeph> as expected. Therefore, 
this is not a query correctness issue, but it could lead to
+          overlooking the presence of invalid data.
+        </p>
+
+        <p>
+          <b>Bug:</b> <xref 
href="https://issues.cloudera.org/browse/IMPALA-1862"; scope="external" 
format="html">IMPALA-1862</xref>
+        </p>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="IMPALA-1652" rev="IMPALA-1652">
+
+<!-- To do: Isn't this more a correctness issue? -->
+
+      <title>Incorrect results with basic predicate on CHAR typed 
column.</title>
+
+      <conbody>
+
+        <p>
+          When comparing a <codeph>CHAR</codeph> column value to a string 
literal, the literal value is not blank-padded and so the
+          comparison might fail when it should match.
+        </p>
+
+        <p>
+          <b>Bug:</b> <xref 
href="https://issues.cloudera.org/browse/IMPALA-1652"; scope="external" 
format="html">IMPALA-1652</xref>
+        </p>
+
+        <p>
+          <b>Workaround:</b> Use the <codeph>RPAD()</codeph> function to 
blank-pad literals compared with <codeph>CHAR</codeph> columns to
+          the expected length.
+        </p>
+
+      </conbody>
+
+    </concept>
+
+  </concept>
+
+  <concept id="known_issues_limitations">
+
+    <title>Impala Known Issues: Limitations</title>
+
+    <conbody>
+
+      <p>
+        These issues are current limitations of Impala that require evaluation 
as you plan how to integrate Impala into your data management
+        workflow.
+      </p>
+
+    </conbody>
+
+    <concept id="IMPALA-77" rev="IMPALA-77">
+
+<!-- Not part of Alex's spreadsheet. Perhaps it really is a permanent 
limitation and nobody is tracking it? -->
+
+      <title>Impala does not support running on clusters with federated 
namespaces</title>
+
+      <conbody>
+
+        <p>
+          Impala does not support running on clusters with federated 
namespaces. The <codeph>impalad</codeph> process will not start on a
+          node running such a filesystem based on the 
<codeph>org.apache.hadoop.fs.viewfs.ViewFs</codeph> class.
+        </p>
+
+        <p>
+          <b>Bug:</b> <xref 
href="https://issues.cloudera.org/browse/IMPALA-77"; scope="external" 
format="html">IMPALA-77</xref>
+        </p>
+
+        <p>
+          <b>Anticipated Resolution:</b> Limitation
+        </p>
+
+        <p>
+          <b>Workaround:</b> Use standard HDFS on all Impala nodes.
+        </p>
+
+      </conbody>
+
+    </concept>
+
+  </concept>
+
+  <concept id="known_issues_misc">
+
+    <title>Impala Known Issues: Miscellaneous / Older Issues</title>
+
+    <conbody>
+
+      <p>
+        These issues do not fall into one of the above categories or have not 
been categorized yet.
+      </p>
+
+    </conbody>
+
+    <concept id="IMPALA-2005" rev="IMPALA-2005">
+
+<!-- Not part of Alex's spreadsheet -->
+
+      <title>A failed CTAS does not drop the table if the insert fails.</title>
+
+      <conbody>
+
+        <p>
+          If a <codeph>CREATE TABLE AS SELECT</codeph> operation successfully 
creates the target table but an error occurs while querying
+          the source table or copying the data, the new table is left behind 
rather than being dropped.
+        </p>
+
+        <p>
+          <b>Bug:</b> <xref 
href="https://issues.cloudera.org/browse/IMPALA-2005"; scope="external" 
format="html">IMPALA-2005</xref>
+        </p>
+
+        <p>
+          <b>Workaround:</b> Drop the new table manually after a failed 
<codeph>CREATE TABLE AS SELECT</codeph>.
+        </p>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="IMPALA-1821" rev="IMPALA-1821">
+
+<!-- Not part of Alex's spreadsheet -->
+
+      <title>Casting scenarios with invalid/inconsistent results</title>
+
+      <conbody>
+
+        <p>
+          Using a <codeph>CAST()</codeph> function to convert large literal 
values to smaller types, or to convert special values such as
+          <codeph>NaN</codeph> or <codeph>Inf</codeph>, produces values not 
consistent with other database systems. This could lead to
+          unexpected results from queries.
+        </p>
+
+        <p>
+          <b>Bug:</b> <xref 
href="https://issues.cloudera.org/browse/IMPALA-1821"; scope="external" 
format="html">IMPALA-1821</xref>
+        </p>
+
+<!-- <p><b>Workaround:</b> Doublecheck that <codeph>CAST()</codeph> operations 
work as expect. The issue applies to expressions involving literals, not values 
read from table columns.</p> -->
+
+      </conbody>
+
+    </concept>
+
+    <concept id="IMPALA-1619" rev="IMPALA-1619">
+
+<!-- Not part of Alex's spreadsheet -->
+
+      <title>Support individual memory allocations larger than 1 GB</title>
+
+      <conbody>
+
+        <p>
+          The largest single block of memory that Impala can allocate during a 
query is 1 GiB. Therefore, a query could fail or Impala could
+          crash if a compressed text file resulted in more than 1 GiB of data 
in uncompressed form, or if a string function such as
+          <codeph>group_concat()</codeph> returned a value greater than 1 GiB.
+        </p>
+
+        <p>
+          <b>Bug:</b> <xref 
href="https://issues.cloudera.org/browse/IMPALA-1619"; scope="external" 
format="html">IMPALA-1619</xref>
+        </p>
+
+        <p><b>Resolution:</b> Fixed in CDH 5.9.0 / Impala 2.7.0 and CDH 5.8.3 
/ Impala 2.6.3.</p>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="IMPALA-941" rev="IMPALA-941">
+
+<!-- Not part of Alex's spreadsheet. Maybe this is interop? -->
+
+      <title>Impala Parser issue when using fully qualified table names that 
start with a number.</title>
+
+      <conbody>
+
+        <p>
+          A fully qualified table name starting with a number could cause a 
parsing error. In a name such as <codeph>db.571_market</codeph>,
+          the decimal point followed by digits is interpreted as a 
floating-point number.
+        </p>
+
+        <p>
+          <b>Bug:</b> <xref 
href="https://issues.cloudera.org/browse/IMPALA-941"; scope="external" 
format="html">IMPALA-941</xref>
+        </p>
+
+        <p>
+          <b>Workaround:</b> Surround each part of the fully qualified name 
with backticks (<codeph>``</codeph>).
+        </p>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="IMPALA-532" rev="IMPALA-532">
+
+<!-- Not part of Alex's spreadsheet. Perhaps it really is a permanent 
limitation and nobody is tracking it? -->
+
+      <title>Impala should tolerate bad locale settings</title>
+
+      <conbody>
+
+        <p>
+          If the <codeph>LC_*</codeph> environment variables specify an 
unsupported locale, Impala does not start.
+        </p>
+
+        <p>
+          <b>Bug:</b> <xref 
href="https://issues.cloudera.org/browse/IMPALA-532"; scope="external" 
format="html">IMPALA-532</xref>
+        </p>
+
+        <p>
+          <b>Workaround:</b> Add <codeph>LC_ALL="C"</codeph> to the 
environment settings for both the Impala daemon and the Statestore
+          daemon. See <xref href="impala_config_options.xml#config_options"/> 
for details about modifying these environment settings.
+        </p>
+
+        <p>
+          <b>Resolution:</b> Fixing this issue would require an upgrade to 
Boost 1.47 in the Impala distribution.
+        </p>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="IMP-1203">
+
+<!-- Not part of Alex's spreadsheet. Perhaps it really is a permanent 
limitation and nobody is tracking it? -->
+
+      <title>Log Level 3 Not Recommended for Impala</title>
+
+      <conbody>
+
+        <p>
+          The extensive logging produced by log level 3 can cause serious 
performance overhead and capacity issues.
+        </p>
+
+        <p>
+          <b>Workaround:</b> Reduce the log level to its default value of 1, 
that is, <codeph>GLOG_v=1</codeph>. See
+          <xref href="impala_logging.xml#log_levels"/> for details about the 
effects of setting different logging levels.
+        </p>
+
+      </conbody>
+
+    </concept>
+
+  </concept>
+
+</concept>


http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3be0f122/docs/topics/impala_kudu.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_kudu.xml b/docs/topics/impala_kudu.xml
new file mode 100644
index 0000000..c530cc1
--- /dev/null
+++ b/docs/topics/impala_kudu.xml
@@ -0,0 +1,167 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="impala_kudu" rev="kudu">
+
+  <title>Using Impala to Query Kudu Tables</title>
+
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="Kudu"/>
+      <data name="Category" value="Querying"/>
+      <data name="Category" value="Data Analysts"/>
+      <data name="Category" value="Developers"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      <indexterm audience="Cloudera">Kudu</indexterm>
+      You can use Impala to query Kudu tables. This capability allows 
convenient access to a storage system that is
+      tuned for different kinds of workloads than the default with Impala. The 
default Impala tables use data files
+      stored on HDFS, which are ideal for bulk loads and queries using 
full-table scans. In contrast, Kudu can do
+      efficient queries for data organized either in data warehouse style 
(with full table scans) or for OLTP-style
+      workloads (with key-based lookups for single rows or small ranges of 
values).
+    </p>
+
+    <p>
+      Certain Impala SQL statements, such as <codeph>UPDATE</codeph> and 
<codeph>DELETE</codeph>, only work with
+      Kudu tables. These operations were impractical from a performance 
perspective to perform at large scale on
+      HDFS data, or on HBase tables.
+    </p>
+
+  </conbody>
+
+  <concept id="kudu_benefits">
+
+    <title>Benefits of Using Kudu Tables with Impala</title>
+
+    <conbody>
+
+      <p>
+        The combination of Kudu and Impala works best for tables where scan 
performance is important, but data
+        arrives continuously, in small batches, or needs to be updated without 
being completely replaced. In these
+        scenarios (such as for streaming data), it might be impractical to use 
Parquet tables because Parquet works
+        best with multi-megabyte data files, requiring substantial overhead to 
replace or reorganize data files to
+        accomodate frequent additions or changes to data. Impala can query 
Kudu tables with scan performance close
+        to that of Parquet, and Impala can also perform update or delete 
operations without replacing the entire
+        table contents. You can also use the Kudu API to do ingestion or 
transformation operations outside of
+        Impala, and Impala can query the current data at any time.
+      </p>
+
+    </conbody>
+
+  </concept>
+
+  <concept id="kudu_primary_key">
+
+    <title>Primary Key Columns for Kudu Tables</title>
+
+    <conbody>
+
+      <p>
+        Kudu tables introduce the notion of primary keys to Impala for the 
first time. The primary key is made up
+        of one or more columns, whose values are combined and used as a lookup 
key during queries. These columns
+        cannot contain any <codeph>NULL</codeph> values or any duplicate 
values, and can never be updated. For a
+        partitioned Kudu table, all the partition key columns must come from 
the set of primary key columns.
+      </p>
+
+      <p>
+        Impala itself still does not have the notion of unique or 
non-<codeph>NULL</codeph> constraints. These
+        restrictions on the primary key columns are enforced on the Kudu side.
+      </p>
+
+      <p>
+        The primary key columns must be the first ones specified in the 
<codeph>CREATE TABLE</codeph> statement.
+        You specify which column or columns make up the primary key in the 
table properties, rather than through
+        attributes in the column list.
+      </p>
+
+      <p>
+        Kudu can do extra optimizations for queries that refer to the primary 
key columns in the
+        <codeph>WHERE</codeph> clause. It is not crucial though to include the 
primary key columns in the
+        <codeph>WHERE</codeph> clause of every query. The benefit is mainly 
for partitioned tables,
+        which divide the data among various tablet servers based on the 
distribution of
+        data values in some or all of the primary key columns.
+      </p>
+
+    </conbody>
+
+  </concept>
+
+  <concept id="kudu_dml">
+
+    <title>Impala DML Support for Kudu Tables</title>
+
+    <conbody>
+
+      <p>
+        Impala supports certain DML statements for Kudu tables only. The 
<codeph>UPDATE</codeph> and
+        <codeph>DELETE</codeph> statements let you modify data within Kudu 
tables without rewriting substantial
+        amounts of table data.
+      </p>
+
+      <p>
+        The <codeph>INSERT</codeph> statement for Kudu tables honors the 
unique and non-<codeph>NULL</codeph>
+        requirements for the primary key columns.
+      </p>
+
+      <p>
+        Because Impala and Kudu do not support transactions, the effects of 
any <codeph>INSERT</codeph>,
+        <codeph>UPDATE</codeph>, or <codeph>DELETE</codeph> statement are 
immediately visible. For example, you
+        cannot do a sequence of <codeph>UPDATE</codeph> statements and only 
make the change visible after all the
+        statements are finished. Also, if a DML statement fails partway 
through, any rows that were already
+        inserted, deleted, or changed remain in the table; there is no 
rollback mechanism to undo the changes.
+      </p>
+
+    </conbody>
+
+  </concept>
+
+  <concept id="kudu_partitioning">
+
+    <title>Partitioning for Kudu Tables</title>
+
+    <conbody>
+
+      <p>
+        Kudu tables use special mechanisms to evenly distribute data among the 
underlying tablet servers. Although
+        we refer to such tables as partitioned tables, they are distinguished 
from traditional Impala partitioned
+        tables by use of different clauses on the <codeph>CREATE 
TABLE</codeph> statement. Partitioned Kudu tables
+        use <codeph>DISTRIBUTE BY</codeph>, <codeph>HASH</codeph>, 
<codeph>RANGE</codeph>, and <codeph>SPLIT
+        ROWS</codeph> clauses rather than the traditional <codeph>PARTITIONED 
BY</codeph> clause. All of the
+        columns involved in these clauses must be primary key columns. These 
clauses let you specify different ways
+        to divide the data for each column, or even for different value ranges 
within a column. This flexibility
+        lets you avoid problems with uneven distribution of data, where the 
partitioning scheme for HDFS tables
+        might result in some partitions being much larger than others. By 
setting up an effective partitioning
+        scheme for a Kudu table, you can ensure that the work for a query can 
be parallelized evenly across the
+        hosts in a cluster.
+      </p>
+
+    </conbody>
+
+  </concept>
+
+  <concept id="kudu_performance">
+
+    <title>Impala Query Performance for Kudu Tables</title>
+
+    <conbody>
+
+      <p>
+        For queries involving Kudu tables, Impala can delegate much of the 
work of filtering the result set to
+        Kudu, avoiding some of the I/O involved in full table scans of tables 
containing HDFS data files. This type
+        of optimization is especially effective for partitioned Kudu tables, 
where the Impala query
+        <codeph>WHERE</codeph> clause refers to one or more primary key 
columns that are also used as partition key
+        columns. For example, if a partitioned Kudu table uses a 
<codeph>HASH</codeph> clause for
+        <codeph>col1</codeph> and a <codeph>RANGE</codeph> clause for 
<codeph>col2</codeph>, a query using a clause
+        such as <codeph>WHERE col1 IN (1,2,3) AND col2 &gt; 100</codeph> can 
determine exactly which tablet servers
+        contain relevant data, and therefore parallelize the query very 
efficiently.
+      </p>
+
+    </conbody>
+
+  </concept>
+
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3be0f122/docs/topics/impala_langref.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_langref.xml b/docs/topics/impala_langref.xml
new file mode 100644
index 0000000..f81b76f
--- /dev/null
+++ b/docs/topics/impala_langref.xml
@@ -0,0 +1,74 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="langref">
+
+  <title>Impala SQL Language Reference</title>
+  <titlealts audience="PDF"><navtitle>SQL Reference</navtitle></titlealts>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="SQL"/>
+      <data name="Category" value="Data Analysts"/>
+      <data name="Category" value="Developers"/>
+      <data name="Category" value="impala-shell"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      Impala uses SQL as its query language. To protect user investment in 
skills development and query
+      design, Impala provides a high degree of compatibility with the Hive 
Query Language (HiveQL):
+    </p>
+
+    <ul>
+      <li>
+        Because Impala uses the same metadata store as Hive to record 
information about table structure and
+        properties, Impala can access tables defined through the native Impala 
<codeph>CREATE TABLE</codeph>
+        command, or tables created using the Hive data definition language 
(DDL).
+      </li>
+
+      <li>
+        Impala supports data manipulation (DML) statements similar to the DML 
component of HiveQL.
+      </li>
+
+      <li>
+        Impala provides many <xref 
href="impala_functions.xml#builtins">built-in functions</xref> with the same
+        names and parameter types as their HiveQL equivalents.
+      </li>
+    </ul>
+
+    <p>
+      Impala supports most of the same <xref 
href="impala_langref_sql.xml#langref_sql">statements and
+      clauses</xref> as HiveQL, including, but not limited to 
<codeph>JOIN</codeph>, <codeph>AGGREGATE</codeph>,
+      <codeph>DISTINCT</codeph>, <codeph>UNION ALL</codeph>, <codeph>ORDER 
BY</codeph>, <codeph>LIMIT</codeph> and
+      (uncorrelated) subquery in the <codeph>FROM</codeph> clause. Impala also 
supports <codeph>INSERT
+      INTO</codeph> and <codeph>INSERT OVERWRITE</codeph>.
+    </p>
+
+    <p>
+      Impala supports data types with the same names and semantics as the 
equivalent Hive data types:
+      <codeph>STRING</codeph>, <codeph>TINYINT</codeph>, 
<codeph>SMALLINT</codeph>, <codeph>INT</codeph>,
+      <codeph>BIGINT</codeph>, <codeph>FLOAT</codeph>, 
<codeph>DOUBLE</codeph>, <codeph>BOOLEAN</codeph>,
+      <codeph>STRING</codeph>, <codeph>TIMESTAMP</codeph>.
+    </p>
+
+    <p>
+      For full details about Impala SQL syntax and semantics, see
+      <xref href="impala_langref_sql.xml#langref_sql"/>.
+    </p>
+
+    <p>
+      Most HiveQL <codeph>SELECT</codeph> and <codeph>INSERT</codeph> 
statements run unmodified with Impala. For
+      information about Hive syntax not available in Impala, see
+      <xref href="impala_langref_unsupported.xml#langref_hiveql_delta"/>.
+    </p>
+
+    <p>
+      For a list of the built-in functions available in Impala queries, see
+      <xref href="impala_functions.xml#builtins"/>.
+    </p>
+
+    <p outputclass="toc"/>
+  </conbody>
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3be0f122/docs/topics/impala_langref_sql.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_langref_sql.xml 
b/docs/topics/impala_langref_sql.xml
new file mode 100644
index 0000000..18b6726
--- /dev/null
+++ b/docs/topics/impala_langref_sql.xml
@@ -0,0 +1,35 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="langref_sql">
+
+  <title>Impala SQL Statements</title>
+  <titlealts audience="PDF"><navtitle>SQL Statements</navtitle></titlealts>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="SQL"/>
+      <data name="Category" value="Developers"/>
+      <data name="Category" value="Data Analysts"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      The Impala SQL dialect supports a range of standard elements, plus some 
extensions for Big Data use cases
+      related to data loading and data warehousing.
+    </p>
+
+    <note>
+      <p>
+        In the <cmdname>impala-shell</cmdname> interpreter, a semicolon at the 
end of each statement is required.
+        Since the semicolon is not actually part of the SQL syntax, we do not 
include it in the syntax definition
+        of each statement, but we do show it in examples intended to be run in 
<cmdname>impala-shell</cmdname>.
+      </p>
+    </note>
+
+    <p audience="PDF" outputclass="toc all">
+      The following sections show the major SQL statements that you work with 
in Impala:
+    </p>
+  </conbody>
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3be0f122/docs/topics/impala_langref_unsupported.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_langref_unsupported.xml 
b/docs/topics/impala_langref_unsupported.xml
new file mode 100644
index 0000000..82910d6
--- /dev/null
+++ b/docs/topics/impala_langref_unsupported.xml
@@ -0,0 +1,312 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="langref_hiveql_delta">
+
+  <title>SQL Differences Between Impala and Hive</title>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="SQL"/>
+      <data name="Category" value="Hive"/>
+      <data name="Category" value="Porting"/>
+      <data name="Category" value="Data Analysts"/>
+      <data name="Category" value="Developers"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      <indexterm audience="Cloudera">Hive</indexterm>
+      <indexterm audience="Cloudera">HiveQL</indexterm>
+      Impala's SQL syntax follows the SQL-92 standard, and includes many 
industry extensions in areas such as
+      built-in functions. See <xref href="impala_porting.xml#porting"/> for a 
general discussion of adapting SQL
+      code from a variety of database systems to Impala.
+    </p>
+
+    <p>
+      Because Impala and Hive share the same metastore database and their 
tables are often used interchangeably,
+      the following section covers differences between Impala and Hive in 
detail.
+    </p>
+
+    <p outputclass="toc inpage"/>
+  </conbody>
+
+  <concept id="langref_hiveql_unsupported">
+
+    <title>HiveQL Features not Available in Impala</title>
+
+    <conbody>
+
+      <p>
+        The current release of Impala does not support the following SQL 
features that you might be familiar with
+        from HiveQL:
+      </p>
+
+      <!-- To do:
+        Yeesh, too many separate lists of unsupported Hive syntax.
+        Here, the FAQ, and in some of the intro topics.
+        Some discussion in IMP-1061 about how best to reorg.
+        Lots of opportunities for conrefs.
+      -->
+
+      <ul>
+<!-- Now supported in <keyword keyref="impala23_full"/> and higher. Find 
places on this page (like already done under lateral views) to note the new 
data type support.
+        <li>
+          Non-scalar data types such as maps, arrays, structs.
+        </li>
+-->
+
+        <li rev="1.2">
+          Extensibility mechanisms such as <codeph>TRANSFORM</codeph>, custom 
file formats, or custom SerDes.
+        </li>
+
+        <li rev="CDH-41376">
+          The <codeph>DATE</codeph> data type.
+        </li>
+
+        <li>
+          XML and JSON functions.
+        </li>
+
+        <li>
+          Certain aggregate functions from HiveQL: <codeph>covar_pop</codeph>, 
<codeph>covar_samp</codeph>,
+          <codeph>corr</codeph>, <codeph>percentile</codeph>, 
<codeph>percentile_approx</codeph>,
+          <codeph>histogram_numeric</codeph>, <codeph>collect_set</codeph>; 
Impala supports the set of aggregate
+          functions listed in <xref 
href="impala_aggregate_functions.xml#aggregate_functions"/> and analytic
+          functions listed in <xref 
href="impala_analytic_functions.xml#analytic_functions"/>.
+        </li>
+
+        <li>
+          Sampling.
+        </li>
+
+        <li>
+          Lateral views. In <keyword keyref="impala23_full"/> and higher, 
Impala supports queries on complex types
+          (<codeph>STRUCT</codeph>, <codeph>ARRAY</codeph>, or 
<codeph>MAP</codeph>), using join notation
+          rather than the <codeph>EXPLODE()</codeph> keyword.
+          See <xref href="impala_complex_types.xml#complex_types"/> for 
details about Impala support for complex types.
+        </li>
+
+        <li>
+          Multiple <codeph>DISTINCT</codeph> clauses per query, although 
Impala includes some workarounds for this
+          limitation.
+          <note 
conref="../shared/impala_common.xml#common/multiple_count_distinct"/>
+        </li>
+      </ul>
+
+      <p>
+        User-defined functions (UDFs) are supported starting in Impala 1.2. 
See <xref href="impala_udf.xml#udfs"/>
+        for full details on Impala UDFs.
+        <ul>
+          <li>
+            <p>
+              Impala supports high-performance UDFs written in C++, as well as 
reusing some Java-based Hive UDFs.
+            </p>
+          </li>
+
+          <li>
+            <p>
+              Impala supports scalar UDFs and user-defined aggregate functions 
(UDAFs). Impala does not currently
+              support user-defined table generating functions (UDTFs).
+            </p>
+          </li>
+
+          <li>
+            <p>
+              Only Impala-supported column types are supported in Java-based 
UDFs.
+            </p>
+          </li>
+
+          <li>
+            <p 
conref="../shared/impala_common.xml#common/current_user_caveat"/>
+          </li>
+        </ul>
+      </p>
+
+      <p>
+        Impala does not currently support these HiveQL statements:
+      </p>
+
+      <ul>
+        <li>
+          <codeph>ANALYZE TABLE</codeph> (the Impala equivalent is 
<codeph>COMPUTE STATS</codeph>)
+        </li>
+
+        <li>
+          <codeph>DESCRIBE COLUMN</codeph>
+        </li>
+
+        <li>
+          <codeph>DESCRIBE DATABASE</codeph>
+        </li>
+
+        <li>
+          <codeph>EXPORT TABLE</codeph>
+        </li>
+
+        <li>
+          <codeph>IMPORT TABLE</codeph>
+        </li>
+
+        <li>
+          <codeph>SHOW TABLE EXTENDED</codeph>
+        </li>
+
+        <li>
+          <codeph>SHOW INDEXES</codeph>
+        </li>
+
+        <li>
+          <codeph>SHOW COLUMNS</codeph>
+        </li>
+
+        <li rev="DOCS-656">
+          <codeph>INSERT OVERWRITE DIRECTORY</codeph>; use <codeph>INSERT 
OVERWRITE <varname>table_name</varname></codeph>
+          or <codeph>CREATE TABLE AS SELECT</codeph> to materialize query 
results into the HDFS directory associated
+          with an Impala table.
+        </li>
+      </ul>
+    </conbody>
+  </concept>
+
+  <concept id="langref_hiveql_semantics">
+
+    <title>Semantic Differences Between Impala and HiveQL Features</title>
+
+    <conbody>
+
+      <p>
+        This section covers instances where Impala and Hive have similar 
functionality, sometimes including the
+        same syntax, but there are differences in the runtime semantics of 
those features.
+      </p>
+
+      <p>
+        <b>Security:</b>
+      </p>
+
+      <p>
+        Impala utilizes the <xref href="http://sentry.incubator.apache.org/"; 
scope="external" format="html">Apache
+        Sentry </xref> authorization framework, which provides fine-grained 
role-based access control
+        to protect data against unauthorized access or tampering.
+      </p>
+
+      <p>
+        The Hive component included in <ph rev="upstream">CDH 5.1</ph> and 
higher now includes Sentry-enabled <codeph>GRANT</codeph>,
+        <codeph>REVOKE</codeph>, and <codeph>CREATE/DROP ROLE</codeph> 
statements. Earlier Hive releases had a
+        privilege system with <codeph>GRANT</codeph> and 
<codeph>REVOKE</codeph> statements that were primarily
+        intended to prevent accidental deletion of data, rather than a 
security mechanism to protect against
+        malicious users.
+      </p>
+
+      <p>
+        Impala can make use of privileges set up through Hive 
<codeph>GRANT</codeph> and <codeph>REVOKE</codeph> statements.
+        Impala has its own <codeph>GRANT</codeph> and <codeph>REVOKE</codeph> 
statements in Impala 2.0 and higher.
+        See <xref href="impala_authorization.xml#authorization"/> for the 
details of authorization in Impala, including
+        how to switch from the original policy file-based privilege model to 
the Sentry service using privileges
+        stored in the metastore database.
+      </p>
+
+      <p>
+        <b>SQL statements and clauses:</b>
+      </p>
+
+      <p>
+        The semantics of Impala SQL statements varies from HiveQL in some 
cases where they use similar SQL
+        statement and clause names:
+      </p>
+
+      <ul>
+        <li>
+          Impala uses different syntax and names for query hints, 
<codeph>[SHUFFLE]</codeph> and
+          <codeph>[NOSHUFFLE]</codeph> rather than <codeph>MapJoin</codeph> or 
<codeph>StreamJoin</codeph>. See
+          <xref href="impala_joins.xml#joins"/> for the Impala details.
+        </li>
+
+        <li>
+          Impala does not expose MapReduce specific features of <codeph>SORT 
BY</codeph>, <codeph>DISTRIBUTE
+          BY</codeph>, or <codeph>CLUSTER BY</codeph>.
+        </li>
+
+        <li>
+          Impala does not require queries to include a <codeph>FROM</codeph> 
clause.
+        </li>
+      </ul>
+
+      <p>
+        <b>Data types:</b>
+      </p>
+
+      <ul>
+        <li>
+          Impala supports a limited set of implicit casts. This can help avoid 
undesired results from unexpected
+          casting behavior.
+          <ul>
+            <li>
+              Impala does not implicitly cast between string and numeric or 
Boolean types. Always use
+              <codeph>CAST()</codeph> for these conversions.
+            </li>
+
+            <li>
+              Impala does perform implicit casts among the numeric types, when 
going from a smaller or less precise
+              type to a larger or more precise one. For example, Impala will 
implicitly convert a
+              <codeph>SMALLINT</codeph> to a <codeph>BIGINT</codeph> or 
<codeph>FLOAT</codeph>, but to convert from
+              <codeph>DOUBLE</codeph> to <codeph>FLOAT</codeph> or 
<codeph>INT</codeph> to <codeph>TINYINT</codeph>
+              requires a call to <codeph>CAST()</codeph> in the query.
+            </li>
+
+            <li>
+              Impala does perform implicit casts from string to timestamp. 
Impala has a restricted set of literal
+              formats for the <codeph>TIMESTAMP</codeph> data type and the 
<codeph>from_unixtime()</codeph> format
+              string; see <xref href="impala_timestamp.xml#timestamp"/> for 
details.
+            </li>
+          </ul>
+          <p>
+            See <xref href="impala_datatypes.xml#datatypes"/> for full details 
on implicit and explicit casting for
+            all types, and <xref 
href="impala_conversion_functions.xml#conversion_functions"/> for details about
+            the <codeph>CAST()</codeph> function.
+          </p>
+        </li>
+
+        <li>
+          Impala does not store or interpret timestamps using the local 
timezone, to avoid undesired results from
+          unexpected time zone issues. Timestamps are stored and interpreted 
relative to UTC. This difference can
+          produce different results for some calls to similarly named 
date/time functions between Impala and Hive.
+          See <xref href="impala_datetime_functions.xml#datetime_functions"/> 
for details about the Impala
+          functions. See <xref href="impala_timestamp.xml#timestamp"/> for a 
discussion of how Impala handles
+          time zones, and configuration options you can use to make Impala 
match the Hive behavior more closely
+          when dealing with Parquet-encoded <codeph>TIMESTAMP</codeph> data or 
when converting between
+          the local time zone and UTC.
+        </li>
+
+        <li>
+          The Impala <codeph>TIMESTAMP</codeph> type can represent dates 
ranging from 1400-01-01 to 9999-12-31.
+          This is different from the Hive date range, which is 0000-01-01 to 
9999-12-31.
+        </li>
+
+        <li>
+          <p 
conref="../shared/impala_common.xml#common/int_overflow_behavior"/>
+        </li>
+
+      </ul>
+
+      <p>
+        <b>Miscellaneous features:</b>
+      </p>
+
+      <ul>
+        <li>
+          Impala does not provide virtual columns.
+        </li>
+
+        <li>
+          Impala does not expose locking.
+        </li>
+
+        <li>
+          Impala does not expose some configuration properties.
+        </li>
+      </ul>
+    </conbody>
+  </concept>
+</concept>

[22/51] [partial] incubator-impala git commit: IMPALA-3398: Add docs to main Impala branch.

Reply via email to