http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3be0f122/docs/topics/impala_known_issues.xml ---------------------------------------------------------------------- diff --git a/docs/topics/impala_known_issues.xml b/docs/topics/impala_known_issues.xml new file mode 100644 index 0000000..e57ec62 --- /dev/null +++ b/docs/topics/impala_known_issues.xml @@ -0,0 +1,1812 @@ +<?xml version="1.0" encoding="UTF-8"?> +<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd"> +<concept rev="ver" id="known_issues"> + + <title><ph audience="standalone">Known Issues and Workarounds in Impala</ph><ph audience="integrated">Apache Impala (incubating) Known Issues</ph></title> + + <prolog> + <metadata> + <data name="Category" value="Impala"/> + <data name="Category" value="Release Notes"/> + <data name="Category" value="Known Issues"/> + <data name="Category" value="Troubleshooting"/> + <data name="Category" value="Upgrading"/> + <data name="Category" value="Administrators"/> + <data name="Category" value="Developers"/> + <data name="Category" value="Data Analysts"/> + </metadata> + </prolog> + + <conbody> + + <p> + The following sections describe known issues and workarounds in Impala, as of the current production release. This page summarizes the + most serious or frequently encountered issues in the current release, to help you make planning decisions about installing and + upgrading. Any workarounds are listed here. The bug links take you to the Impala issues site, where you can see the diagnosis and + whether a fix is in the pipeline. + </p> + + <note> + The online issue tracking system for Impala contains comprehensive information and is updated in real time. To verify whether an issue + you are experiencing has already been reported, or which release an issue is fixed in, search on the + <xref href="https://issues.cloudera.org/" scope="external" format="html">issues.cloudera.org JIRA tracker</xref>. + </note> + + <p outputclass="toc inpage"/> + + <p> + For issues fixed in various Impala releases, see <xref href="impala_fixed_issues.xml#fixed_issues"/>. + </p> + +<!-- Use as a template for new issues. + <concept id=""> + <title></title> + <conbody> + <p> + </p> + <p><b>Bug:</b> <xref href="https://issues.cloudera.org/browse/" scope="external" format="html"></xref></p> + <p><b>Severity:</b> High</p> + <p><b>Resolution:</b> </p> + <p><b>Workaround:</b> </p> + </conbody> + </concept> + +--> + + </conbody> + +<!-- New known issues for CDH 5.5 / Impala 2.3. + +Title: Server-to-server SSL and Kerberos do not work together +Description: If server<->server SSL is enabled (with ssl_client_ca_certificate), and Kerberos auth is used between servers, the cluster will fail to start. +Upstream & Internal JIRAs: https://issues.cloudera.org/browse/IMPALA-2598 +Severity: Medium. Server-to-server SSL is practically unusable but this is a new feature. +Workaround: No known workaround. + +Title: Queries may hang on server-to-server exchange errors +Description: The DataStreamSender::Channel::CloseInternal() does not close the channel on an error. This will cause the node on the other side of the channel to wait indefinitely causing a hang. +Upstream & Internal JIRAs: https://issues.cloudera.org/browse/IMPALA-2592 +Severity: Low. This does not occur frequently. +Workaround: No known workaround. + +Title: Catalogd may crash when loading metadata for tables with many partitions, many columns and with incremental stats +Description: Incremental stats use up about 400 bytes per partition X column. So for a table with 20K partitions and 100 columns this is about 800 MB. When serialized this goes past the 2 GB Java array size limit and leads to a catalog crash. +Upstream & Internal JIRAs: https://issues.cloudera.org/browse/IMPALA-2648, IMPALA-2647, IMPALA-2649. +Severity: Low. This does not occur frequently. +Workaround: Reduce the number of partitions. + +More from: https://issues.cloudera.org/browse/IMPALA-2093?filter=11278&jql=project%20%3D%20IMPALA%20AND%20priority%20in%20(blocker%2C%20critical)%20AND%20status%20in%20(open%2C%20Reopened)%20AND%20labels%20%3D%20correctness%20ORDER%20BY%20priority%20DESC + +IMPALA-2093 +Wrong plan of NOT IN aggregate subquery when a constant is used in subquery predicate +IMPALA-1652 +Incorrect results with basic predicate on CHAR typed column. +IMPALA-1459 +Incorrect assignment of predicates through an outer join in an inline view. +IMPALA-2665 +Incorrect assignment of On-clause predicate inside inline view with an outer join. +IMPALA-2603 +Crash: impala::Coordinator::ValidateCollectionSlots +IMPALA-2375 +Fix issues with the legacy join and agg nodes using enable_partitioned_hash_join=false and enable_partitioned_aggregation=false +IMPALA-1862 +Invalid bool value not reported as a scanner error +IMPALA-1792 +ImpalaODBC: Can not get the value in the SQLGetData(m-x th column) after the SQLBindCol(m th column) +IMPALA-1578 +Impala incorrectly handles text data when the new line character \n\r is split between different HDFS block +IMPALA-2643 +Duplicated column in inline view causes dropping null slots during scan +IMPALA-2005 +A failed CTAS does not drop the table if the insert fails. +IMPALA-1821 +Casting scenarios with invalid/inconsistent results + +Another list from Alex, of correctness problems with predicates; might overlap with ones I already have: + +https://issues.cloudera.org/browse/IMPALA-2665 - Already have +https://issues.cloudera.org/browse/IMPALA-2643 - Already have +https://issues.cloudera.org/browse/IMPALA-1459 - Already have +https://issues.cloudera.org/browse/IMPALA-2144 - Don't have + +--> + + <concept id="known_issues_crash"> + + <title>Impala Known Issues: Crashes and Hangs</title> + + <conbody> + + <p> + These issues can cause Impala to quit or become unresponsive. + </p> + + </conbody> + + <concept id="IMPALA-3069" rev="IMPALA-3069"> + + <title>Setting BATCH_SIZE query option too large can cause a crash</title> + + <conbody> + + <p> + Using a value in the millions for the <codeph>BATCH_SIZE</codeph> query option, together with wide rows or large string values in + columns, could cause a memory allocation of more than 2 GB resulting in a crash. + </p> + + <p> + <b>Bug:</b> <xref href="https://issues.cloudera.org/browse/IMPALA-3069" scope="external" format="html">IMPALA-3069</xref> + </p> + + <p> + <b>Severity:</b> High + </p> + + <p><b>Resolution:</b> Fixed in CDH 5.9.0 / Impala 2.7.0.</p> + + </conbody> + + </concept> + + <concept id="IMPALA-3441" rev="IMPALA-3441"> + + <title></title> + + <conbody> + + <p> + Malformed Avro data, such as out-of-bounds integers or values in the wrong format, could cause a crash when queried. + </p> + + <p> + <b>Bug:</b> <xref href="https://issues.cloudera.org/browse/IMPALA-3441" scope="external" format="html">IMPALA-3441</xref> + </p> + + <p> + <b>Severity:</b> High + </p> + + <p><b>Resolution:</b> Fixed in CDH 5.9.0 / Impala 2.7.0 and CDH 5.8.2 / Impala 2.6.2.</p> + + </conbody> + + </concept> + + <concept id="IMPALA-2592" rev="IMPALA-2592"> + + <title>Queries may hang on server-to-server exchange errors</title> + + <conbody> + + <p> + The <codeph>DataStreamSender::Channel::CloseInternal()</codeph> does not close the channel on an error. This causes the node on + the other side of the channel to wait indefinitely, causing a hang. + </p> + + <p> + <b>Bug:</b> <xref href="https://issues.cloudera.org/browse/IMPALA-2592" scope="external" format="html">IMPALA-2592</xref> + </p> + + <p> + <b>Resolution:</b> Fixed in CDH 5.7.0 / Impala 2.5.0. + </p> + + </conbody> + + </concept> + + <concept id="IMPALA-2365" rev="IMPALA-2365"> + + <title>Impalad is crashing if udf jar is not available in hdfs location for first time</title> + + <conbody> + + <p> + If the JAR file corresponding to a Java UDF is removed from HDFS after the Impala <codeph>CREATE FUNCTION</codeph> statement is + issued, the <cmdname>impalad</cmdname> daemon crashes. + </p> + + <p> + <b>Bug:</b> <xref href="https://issues.cloudera.org/browse/IMPALA-2365" scope="external" format="html">IMPALA-2365</xref> + </p> + + <p><b>Resolution:</b> Fixed in CDH 5.7.0 / Impala 2.5.0.</p> + + </conbody> + + </concept> + + </concept> + + <concept id="known_issues_performance"> + + <title id="ki_performance">Impala Known Issues: Performance</title> + + <conbody> + + <p> + These issues involve the performance of operations such as queries or DDL statements. + </p> + + </conbody> + + <concept id="IMPALA-1480" rev="IMPALA-1480"> + +<!-- Not part of Alex's spreadsheet. Spreadsheet has IMPALA-1423 which mentions it's similar to this one but not a duplicate. --> + + <title>Slow DDL statements for tables with large number of partitions</title> + + <conbody> + + <p> + DDL statements for tables with a large number of partitions might be slow. + </p> + + <p> + <b>Bug:</b> <xref href="https://issues.cloudera.org/browse/IMPALA-1480" scope="external" format="html"></xref>IMPALA-1480 + </p> + + <p> + <b>Workaround:</b> Run the DDL statement in Hive if the slowness is an issue. + </p> + + <p><b>Resolution:</b> Fixed in CDH 5.7.0 / Impala 2.5.0.</p> + + </conbody> + + </concept> + + </concept> + + <concept id="known_issues_usability"> + + <title id="ki_usability">Impala Known Issues: Usability</title> + + <conbody> + + <p> + These issues affect the convenience of interacting directly with Impala, typically through the Impala shell or Hue. + </p> + + </conbody> + + <concept id="IMPALA-3133" rev="IMPALA-3133"> + + <title>Unexpected privileges in show output</title> + + <conbody> + + <p> + Due to a timing condition in updating cached policy data from Sentry, the <codeph>SHOW</codeph> statements for Sentry roles could + sometimes display out-of-date role settings. Because Impala rechecks authorization for each SQL statement, this discrepancy does + not represent a security issue for other statements. + </p> + + <p> + <b>Bug:</b> <xref href="https://issues.cloudera.org/browse/IMPALA-3133" scope="external" format="html">IMPALA-3133</xref> + </p> + + <p> + <b>Severity:</b> High + </p> + + <p> + <b>Resolution:</b> Fixes have been issued for some but not all CDH / Impala releases. Check the JIRA for details of fix releases. + </p> + + <p><b>Resolution:</b> Fixed in CDH 5.8.0 / Impala 2.6.0 and CDH 5.7.1 / Impala 2.5.1.</p> + + </conbody> + + </concept> + + <concept id="IMPALA-1776" rev="IMPALA-1776"> + + <title>Less than 100% progress on completed simple SELECT queries</title> + + <conbody> + + <p> + Simple <codeph>SELECT</codeph> queries show less than 100% progress even though they are already completed. + </p> + + <p> + <b>Bug:</b> <xref href="https://issues.cloudera.org/browse/IMPALA-1776" scope="external" format="html">IMPALA-1776</xref> + </p> + + </conbody> + + </concept> + + <concept id="concept_lmx_dk5_lx"> + + <title>Unexpected column overflow behavior with INT datatypes</title> + + <conbody> + + <p conref="../shared/impala_common.xml#common/int_overflow_behavior" /> + + <p> + <b>Bug:</b> + <xref href="https://issues.cloudera.org/browse/IMPALA-3123" + scope="external" format="html">IMPALA-3123</xref> + </p> + + </conbody> + + </concept> + + </concept> + + <concept id="known_issues_drivers"> + + <title id="ki_drivers">Impala Known Issues: JDBC and ODBC Drivers</title> + + <conbody> + + <p> + These issues affect applications that use the JDBC or ODBC APIs, such as business intelligence tools or custom-written applications + in languages such as Java or C++. + </p> + + </conbody> + + <concept id="IMPALA-1792" rev="IMPALA-1792"> + +<!-- Not part of Alex's spreadsheet --> + + <title>ImpalaODBC: Can not get the value in the SQLGetData(m-x th column) after the SQLBindCol(m th column)</title> + + <conbody> + + <p> + If the ODBC <codeph>SQLGetData</codeph> is called on a series of columns, the function calls must follow the same order as the + columns. For example, if data is fetched from column 2 then column 1, the <codeph>SQLGetData</codeph> call for column 1 returns + <codeph>NULL</codeph>. + </p> + + <p> + <b>Bug:</b> <xref href="https://issues.cloudera.org/browse/IMPALA-1792" scope="external" format="html">IMPALA-1792</xref> + </p> + + <p> + <b>Workaround:</b> Fetch columns in the same order they are defined in the table. + </p> + + </conbody> + + </concept> + + </concept> + + <concept id="known_issues_security"> + + <title id="ki_security">Impala Known Issues: Security</title> + + <conbody> + + <p> + These issues relate to security features, such as Kerberos authentication, Sentry authorization, encryption, auditing, and + redaction. + </p> + + </conbody> + +<!-- To do: Hiding for the moment. https://jira.cloudera.com/browse/CDH-38736 reports the issue is fixed. --> + + <concept id="impala-shell_ssl_dependency" audience="Cloudera" rev="impala-shell_ssl_dependency"> + + <title>impala-shell requires Python with ssl module</title> + + <conbody> + + <p> + On CentOS 5.10 and Oracle Linux 5.11 using the built-in Python 2.4, invoking the <cmdname>impala-shell</cmdname> with the + <codeph>--ssl</codeph> option might fail with the following error: + </p> + +<codeblock> +Unable to import the python 'ssl' module. It is required for an SSL-secured connection. +</codeblock> + +<!-- No associated IMPALA-* JIRA... It is the internal JIRA CDH-38736. --> + + <p> + <b>Severity:</b> Low, workaround available + </p> + + <p> + <b>Resolution:</b> Customers are less likely to experience this issue over time, because <codeph>ssl</codeph> module is included + in newer Python releases packaged with recent Linux releases. + </p> + + <p> + <b>Workaround:</b> To use SSL with <cmdname>impala-shell</cmdname> on these platform versions, install the <codeph>ssh</codeph> + Python module: + </p> + +<codeblock> +yum install python-ssl +</codeblock> + + <p> + Then <cmdname>impala-shell</cmdname> can run when using SSL. For example: + </p> + +<codeblock> +impala-shell -s impala --ssl --ca_cert /path_to_truststore/truststore.pem +</codeblock> + + </conbody> + + </concept> + + <concept id="renewable_kerberos_tickets"> + +<!-- Not part of Alex's spreadsheet. Not associated with a JIRA number AFAIK. --> + + <title>Kerberos tickets must be renewable</title> + + <conbody> + + <p> + In a Kerberos environment, the <cmdname>impalad</cmdname> daemon might not start if Kerberos tickets are not renewable. + </p> + + <p> + <b>Workaround:</b> Configure your KDC to allow tickets to be renewed, and configure <filepath>krb5.conf</filepath> to request + renewable tickets. + </p> + + </conbody> + + </concept> + +<!-- To do: Fixed in 2.5.0, 2.3.2. Commenting out until I see how it can fix into "known issues now fixed" convention. + That set of fix releases looks incomplete so probably have to do some detective work with the JIRA. + https://issues.cloudera.org/browse/IMPALA-2598 + <concept id="IMPALA-2598" rev="IMPALA-2598"> + + <title>Server-to-server SSL and Kerberos do not work together</title> + + <conbody> + + <p> + If SSL is enabled between internal Impala components (with <codeph>ssl_client_ca_certificate</codeph>), and Kerberos + authentication is used between servers, the cluster fails to start. + </p> + + <p> + <b>Bug:</b> <xref href="https://issues.cloudera.org/browse/IMPALA-2598" scope="external" format="html">IMPALA-2598</xref> + </p> + + <p> + <b>Workaround:</b> Do not use the new <codeph>ssl_client_ca_certificate</codeph> setting on Kerberos-enabled clusters until this + issue is resolved. + </p> + + <p><b>Resolution:</b> Fixed in CDH 5.7.0 / Impala 2.5.0 and CDH 5.5.2 / Impala 2.3.2.</p> + + </conbody> + + </concept> +--> + + </concept> + +<!-- + <concept id="known_issues_supportability"> + + <title id="ki_supportability">Impala Known Issues: Supportability</title> + + <conbody> + + <p> + These issues affect the ability to debug and troubleshoot Impala, such as incorrect output in query profiles or the query state + shown in monitoring applications. + </p> + + </conbody> + + </concept> +--> + + <concept id="known_issues_resources"> + + <title id="ki_resources">Impala Known Issues: Resources</title> + + <conbody> + + <p> + These issues involve memory or disk usage, including out-of-memory conditions, the spill-to-disk feature, and resource management + features. + </p> + + </conbody> + + <concept id="TSB-168"> + + <title>Impala catalogd heap issues when upgrading to 5.7</title> + + <conbody> + + <p> + The default heap size for Impala <cmdname>catalogd</cmdname> has changed in <keyword keyref="impala25_full"/> and higher: + </p> + + <ul> + <li> + <p> + Before 5.7, by default <cmdname>catalogd</cmdname> was using the JVM's default heap size, which is the smaller of 1/4th of the + physical memory or 32 GB. + </p> + </li> + + <li> + <p> + Starting with CDH 5.7.0, the default <cmdname>catalogd</cmdname> heap size is 4 GB. + </p> + </li> + </ul> + + <p> + For example, on a host with 128GB physical memory this will result in catalogd heap decreasing from 32GB to 4GB. This can result + in out-of-memory errors in catalogd and leading to query failures. + </p> + + <p audience="Cloudera"> + <b>Bug:</b> <xref href="https://jira.cloudera.com/browse/TSB-168" scope="external" format="html">TSB-168</xref> + </p> + + <p> + <b>Severity:</b> High + </p> + + <p> + <b>Workaround:</b> Increase the <cmdname>catalogd</cmdname> memory limit as follows. +<!-- See <xref href="impala_scalability.xml#scalability_catalog"/> for the procedure. --> +<!-- Including full details here via conref, for benefit of PDF readers or anyone else + who might have trouble seeing or following the link. --> + </p> + + <p conref="../shared/impala_common.xml#common/increase_catalogd_heap_size"/> + + </conbody> + + </concept> + + <concept id="IMPALA-3509" rev="IMPALA-3509"> + + <title>Breakpad minidumps can be very large when the thread count is high</title> + + <conbody> + + <p> + The size of the breakpad minidump files grows linearly with the number of threads. By default, each thread adds 8 KB to the + minidump size. Minidump files could consume significant disk space when the daemons have a high number of threads. + </p> + + <p> + <b>Bug:</b> <xref href="https://issues.cloudera.org/browse/IMPALA-3509" scope="external" format="html">IMPALA-3509</xref> + </p> + + <p> + <b>Severity:</b> High + </p> + + <p> + <b>Workaround:</b> Add <codeph>--minidump_size_limit_hint_kb=<varname>size</varname></codeph> to set a soft upper limit on the + size of each minidump file. If the minidump file would exceed that limit, Impala reduces the amount of information for each thread + from 8 KB to 2 KB. (Full thread information is captured for the first 20 threads, then 2 KB per thread after that.) The minidump + file can still grow larger than the <q>hinted</q> size. For example, if you have 10,000 threads, the minidump file can be more + than 20 MB. + </p> + + </conbody> + + </concept> + + <concept id="IMPALA-3662" rev="IMPALA-3662"> + + <title>Parquet scanner memory increase after IMPALA-2736</title> + + <conbody> + + <p> + The initial release of <keyword keyref="impala26_full"/> sometimes has a higher peak memory usage than in previous releases while reading + Parquet files. + </p> + + <p> + <keyword keyref="impala26_full"/> addresses the issue IMPALA-2736, which improves the efficiency of Parquet scans by up to 2x. The faster scans + may result in a higher peak memory consumption compared to earlier versions of Impala due to the new column-wise row + materialization strategy. You are likely to experience higher memory consumption in any of the following scenarios: + <ul> + <li> + <p> + Very wide rows due to projecting many columns in a scan. + </p> + </li> + + <li> + <p> + Very large rows due to big column values, for example, long strings or nested collections with many items. + </p> + </li> + + <li> + <p> + Producer/consumer speed imbalances, leading to more rows being buffered between a scan (producer) and downstream (consumer) + plan nodes. + </p> + </li> + </ul> + </p> + + <p> + <b>Bug:</b> <xref href="https://issues.cloudera.org/browse/IMPALA-3662" scope="external" format="html">IMPALA-3662</xref> + </p> + + <p> + <b>Severity:</b> High + </p> + + <p> + <b>Workaround:</b> The following query options might help to reduce memory consumption in the Parquet scanner: + <ul> + <li> + Reduce the number of scanner threads, for example: <codeph>set num_scanner_threads=30</codeph> + </li> + + <li> + Reduce the batch size, for example: <codeph>set batch_size=512</codeph> + </li> + + <li> + Increase the memory limit, for example: <codeph>set mem_limit=64g</codeph> + </li> + </ul> + </p> + + </conbody> + + </concept> + + <concept id="IMPALA-691" rev="IMPALA-691"> + + <title>Process mem limit does not account for the JVM's memory usage</title> + +<!-- Supposed to be resolved for Impala 2.3.0. --> + + <conbody> + + <p> + Some memory allocated by the JVM used internally by Impala is not counted against the memory limit for the + <cmdname>impalad</cmdname> daemon. + </p> + + <p> + <b>Bug:</b> <xref href="https://issues.cloudera.org/browse/IMPALA-691" scope="external" format="html">IMPALA-691</xref> + </p> + + <p> + <b>Workaround:</b> To monitor overall memory usage, use the <cmdname>top</cmdname> command, or add the memory figures in the + Impala web UI <uicontrol>/memz</uicontrol> tab to JVM memory usage shown on the <uicontrol>/metrics</uicontrol> tab. + </p> + + </conbody> + + </concept> + + <concept id="IMPALA-2375" rev="IMPALA-2375"> + +<!-- Not part of Alex's spreadsheet --> + + <title>Fix issues with the legacy join and agg nodes using --enable_partitioned_hash_join=false and --enable_partitioned_aggregation=false</title> + + <conbody> + + <p></p> + + <p> + <b>Bug:</b> <xref href="https://issues.cloudera.org/browse/IMPALA-2375" scope="external" format="html">IMPALA-2375</xref> + </p> + + <p> + <b>Workaround:</b> Transition away from the <q>old-style</q> join and aggregation mechanism if practical. + </p> + + <p><b>Resolution:</b> Fixed in CDH 5.7.0 / Impala 2.5.0.</p> + + </conbody> + + </concept> + + </concept> + + <concept id="known_issues_correctness"> + + <title id="ki_correctness">Impala Known Issues: Correctness</title> + + <conbody> + + <p> + These issues can cause incorrect or unexpected results from queries. They typically only arise in very specific circumstances. + </p> + + </conbody> + + <concept id="IMPALA-3084" rev="IMPALA-3084"> + + <title>Incorrect assignment of NULL checking predicate through an outer join of a nested collection.</title> + + <conbody> + + <p> + A query could return wrong results (too many or too few <codeph>NULL</codeph> values) if it referenced an outer-joined nested + collection and also contained a null-checking predicate (<codeph>IS NULL</codeph>, <codeph>IS NOT NULL</codeph>, or the + <codeph><=></codeph> operator) in the <codeph>WHERE</codeph> clause. + </p> + + <p> + <b>Bug:</b> <xref href="https://issues.cloudera.org/browse/IMPALA-3084" scope="external" format="html">IMPALA-3084</xref> + </p> + + <p> + <b>Severity:</b> High + </p> + + <p><b>Resolution:</b> Fixed in CDH 5.9.0 / Impala 2.7.0.</p> + + </conbody> + + </concept> + + <concept id="IMPALA-3094" rev="IMPALA-3094"> + + <title>Incorrect result due to constant evaluation in query with outer join</title> + + <conbody> + + <p> + An <codeph>OUTER JOIN</codeph> query could omit some expected result rows due to a constant such as <codeph>FALSE</codeph> in + another join clause. For example: + </p> + +<codeblock><![CDATA[ +explain SELECT 1 FROM alltypestiny a1 + INNER JOIN alltypesagg a2 ON a1.smallint_col = a2.year AND false + RIGHT JOIN alltypes a3 ON a1.year = a1.bigint_col; ++---------------------------------------------------------+ +| Explain String | ++---------------------------------------------------------+ +| Estimated Per-Host Requirements: Memory=1.00KB VCores=1 | +| | +| 00:EMPTYSET | ++---------------------------------------------------------+ +]]> +</codeblock> + + <p> + <b>Bug:</b> <xref href="https://issues.cloudera.org/browse/IMPALA-3094" scope="external" format="html">IMPALA-3094</xref> + </p> + + <p> + <b>Severity:</b> High + </p> + + <p> + <b>Resolution:</b> + </p> + + <p> + <b>Workaround:</b> + </p> + + </conbody> + + </concept> + + <concept id="IMPALA-3126" rev="IMPALA-3126"> + + <title>Incorrect assignment of an inner join On-clause predicate through an outer join.</title> + + <conbody> + + <p> + Impala may return incorrect results for queries that have the following properties: + </p> + + <ul> + <li> + <p> + There is an INNER JOIN following a series of OUTER JOINs. + </p> + </li> + + <li> + <p> + The INNER JOIN has an On-clause with a predicate that references at least two tables that are on the nullable side of the + preceding OUTER JOINs. + </p> + </li> + </ul> + + <p> + The following query demonstrates the issue: + </p> + +<codeblock> +select 1 from functional.alltypes a left outer join + functional.alltypes b on a.id = b.id left outer join + functional.alltypes c on b.id = c.id right outer join + functional.alltypes d on c.id = d.id inner join functional.alltypes e +on b.int_col = c.int_col; +</codeblock> + + <p> + The following listing shows the incorrect <codeph>EXPLAIN</codeph> plan: + </p> + +<codeblock><![CDATA[ ++-----------------------------------------------------------+ +| Explain String | ++-----------------------------------------------------------+ +| Estimated Per-Host Requirements: Memory=480.04MB VCores=4 | +| | +| 14:EXCHANGE [UNPARTITIONED] | +| | | +| 08:NESTED LOOP JOIN [CROSS JOIN, BROADCAST] | +| | | +| |--13:EXCHANGE [BROADCAST] | +| | | | +| | 04:SCAN HDFS [functional.alltypes e] | +| | partitions=24/24 files=24 size=478.45KB | +| | | +| 07:HASH JOIN [RIGHT OUTER JOIN, PARTITIONED] | +| | hash predicates: c.id = d.id | +| | runtime filters: RF000 <- d.id | +| | | +| |--12:EXCHANGE [HASH(d.id)] | +| | | | +| | 03:SCAN HDFS [functional.alltypes d] | +| | partitions=24/24 files=24 size=478.45KB | +| | | +| 06:HASH JOIN [LEFT OUTER JOIN, PARTITIONED] | +| | hash predicates: b.id = c.id | +| | other predicates: b.int_col = c.int_col <--- incorrect placement; should be at node 07 or 08 +| | runtime filters: RF001 <- c.int_col | +| | | +| |--11:EXCHANGE [HASH(c.id)] | +| | | | +| | 02:SCAN HDFS [functional.alltypes c] | +| | partitions=24/24 files=24 size=478.45KB | +| | runtime filters: RF000 -> c.id | +| | | +| 05:HASH JOIN [RIGHT OUTER JOIN, PARTITIONED] | +| | hash predicates: b.id = a.id | +| | runtime filters: RF002 <- a.id | +| | | +| |--10:EXCHANGE [HASH(a.id)] | +| | | | +| | 00:SCAN HDFS [functional.alltypes a] | +| | partitions=24/24 files=24 size=478.45KB | +| | | +| 09:EXCHANGE [HASH(b.id)] | +| | | +| 01:SCAN HDFS [functional.alltypes b] | +| partitions=24/24 files=24 size=478.45KB | +| runtime filters: RF001 -> b.int_col, RF002 -> b.id | ++-----------------------------------------------------------+ +]]> +</codeblock> + + <p> + <b>Bug:</b> <xref href="https://issues.cloudera.org/browse/IMPALA-3126" scope="external" format="html">IMPALA-3126</xref> + </p> + + <p> + <b>Severity:</b> High + </p> + + <p> + <b>Workaround:</b> High + </p> + + <p> + For some queries, this problem can be worked around by placing the problematic <codeph>ON</codeph> clause predicate in the + <codeph>WHERE</codeph> clause instead, or changing the preceding <codeph>OUTER JOIN</codeph>s to <codeph>INNER JOIN</codeph>s (if + the <codeph>ON</codeph> clause predicate would discard <codeph>NULL</codeph>s). For example, to fix the problematic query above: + </p> + +<codeblock><![CDATA[ +select 1 from functional.alltypes a + left outer join functional.alltypes b + on a.id = b.id + left outer join functional.alltypes c + on b.id = c.id + right outer join functional.alltypes d + on c.id = d.id + inner join functional.alltypes e +where b.int_col = c.int_col + ++-----------------------------------------------------------+ +| Explain String | ++-----------------------------------------------------------+ +| Estimated Per-Host Requirements: Memory=480.04MB VCores=4 | +| | +| 14:EXCHANGE [UNPARTITIONED] | +| | | +| 08:NESTED LOOP JOIN [CROSS JOIN, BROADCAST] | +| | | +| |--13:EXCHANGE [BROADCAST] | +| | | | +| | 04:SCAN HDFS [functional.alltypes e] | +| | partitions=24/24 files=24 size=478.45KB | +| | | +| 07:HASH JOIN [RIGHT OUTER JOIN, PARTITIONED] | +| | hash predicates: c.id = d.id | +| | other predicates: b.int_col = c.int_col <-- correct assignment +| | runtime filters: RF000 <- d.id | +| | | +| |--12:EXCHANGE [HASH(d.id)] | +| | | | +| | 03:SCAN HDFS [functional.alltypes d] | +| | partitions=24/24 files=24 size=478.45KB | +| | | +| 06:HASH JOIN [LEFT OUTER JOIN, PARTITIONED] | +| | hash predicates: b.id = c.id | +| | | +| |--11:EXCHANGE [HASH(c.id)] | +| | | | +| | 02:SCAN HDFS [functional.alltypes c] | +| | partitions=24/24 files=24 size=478.45KB | +| | runtime filters: RF000 -> c.id | +| | | +| 05:HASH JOIN [RIGHT OUTER JOIN, PARTITIONED] | +| | hash predicates: b.id = a.id | +| | runtime filters: RF001 <- a.id | +| | | +| |--10:EXCHANGE [HASH(a.id)] | +| | | | +| | 00:SCAN HDFS [functional.alltypes a] | +| | partitions=24/24 files=24 size=478.45KB | +| | | +| 09:EXCHANGE [HASH(b.id)] | +| | | +| 01:SCAN HDFS [functional.alltypes b] | +| partitions=24/24 files=24 size=478.45KB | +| runtime filters: RF001 -> b.id | ++-----------------------------------------------------------+ +]]> +</codeblock> + + </conbody> + + </concept> + + <concept id="IMPALA-3006" rev="IMPALA-3006"> + + <title>Impala may use incorrect bit order with BIT_PACKED encoding</title> + + <conbody> + + <p> + Parquet <codeph>BIT_PACKED</codeph> encoding as implemented by Impala is LSB first. The parquet standard says it is MSB first. + </p> + + <p> + <b>Bug:</b> <xref href="https://issues.cloudera.org/browse/IMPALA-3006" scope="external" format="html">IMPALA-3006</xref> + </p> + + <p> + <b>Severity:</b> High, but rare in practice because BIT_PACKED is infrequently used, is not written by Impala, and is deprecated + in Parquet 2.0. + </p> + + </conbody> + + </concept> + + <concept id="IMPALA-3082" rev="IMPALA-3082"> + + <title>BST between 1972 and 1995</title> + + <conbody> + + <p> + The calculation of start and end times for the BST (British Summer Time) time zone could be incorrect between 1972 and 1995. + Between 1972 and 1995, BST began and ended at 02:00 GMT on the third Sunday in March (or second Sunday when Easter fell on the + third) and fourth Sunday in October. For example, both function calls should return 13, but actually return 12, in a query such + as: + </p> + +<codeblock> +select + extract(from_utc_timestamp(cast('1970-01-01 12:00:00' as timestamp), 'Europe/London'), "hour") summer70start, + extract(from_utc_timestamp(cast('1970-12-31 12:00:00' as timestamp), 'Europe/London'), "hour") summer70end; +</codeblock> + + <p> + <b>Bug:</b> <xref href="https://issues.cloudera.org/browse/IMPALA-3082" scope="external" format="html">IMPALA-3082</xref> + </p> + + <p> + <b>Severity:</b> High + </p> + + </conbody> + + </concept> + + <concept id="IMPALA-1170" rev="IMPALA-1170"> + + <title>parse_url() returns incorrect result if @ character in URL</title> + + <conbody> + + <p> + If a URL contains an <codeph>@</codeph> character, the <codeph>parse_url()</codeph> function could return an incorrect value for + the hostname field. + </p> + + <p> + <b>Bug:</b> <xref href="https://issues.cloudera.org/browse/IMPALA-1170" scope="external" format="html"></xref>IMPALA-1170 + </p> + + <p><b>Resolution:</b> Fixed in CDH 5.7.0 / Impala 2.5.0 and CDH 5.5.4 / Impala 2.3.4.</p> + + </conbody> + + </concept> + + <concept id="IMPALA-2422" rev="IMPALA-2422"> + + <title>% escaping does not work correctly when occurs at the end in a LIKE clause</title> + + <conbody> + + <p> + If the final character in the RHS argument of a <codeph>LIKE</codeph> operator is an escaped <codeph>\%</codeph> character, it + does not match a <codeph>%</codeph> final character of the LHS argument. + </p> + + <p> + <b>Bug:</b> <xref href="https://issues.cloudera.org/browse/IMPALA-2422" scope="external" format="html">IMPALA-2422</xref> + </p> + + </conbody> + + </concept> + + <concept id="IMPALA-397" rev="IMPALA-397"> + + <title>ORDER BY rand() does not work.</title> + + <conbody> + + <p> + Because the value for <codeph>rand()</codeph> is computed early in a query, using an <codeph>ORDER BY</codeph> expression + involving a call to <codeph>rand()</codeph> does not actually randomize the results. + </p> + + <p> + <b>Bug:</b> <xref href="https://issues.cloudera.org/browse/IMPALA-397" scope="external" format="html">IMPALA-397</xref> + </p> + + </conbody> + + </concept> + + <concept id="IMPALA-2643" rev="IMPALA-2643"> + + <title>Duplicated column in inline view causes dropping null slots during scan</title> + + <conbody> + + <p> + If the same column is queried twice within a view, <codeph>NULL</codeph> values for that column are omitted. For example, the + result of <codeph>COUNT(*)</codeph> on the view could be less than expected. + </p> + + <p> + <b>Bug:</b> <xref href="https://issues.cloudera.org/browse/IMPALA-2643" scope="external" format="html">IMPALA-2643</xref> + </p> + + <p> + <b>Workaround:</b> Avoid selecting the same column twice within an inline view. + </p> + + <p><b>Resolution:</b> Fixed in CDH 5.7.0 / Impala 2.5.0, CDH 5.5.2 / Impala 2.3.2, and CDH 5.4.10 / Impala 2.2.10.</p> + + </conbody> + + </concept> + + <concept id="IMPALA-1459" rev="IMPALA-1459"> + +<!-- Not part of Alex's spreadsheet --> + + <title>Incorrect assignment of predicates through an outer join in an inline view.</title> + + <conbody> + + <p> + A query involving an <codeph>OUTER JOIN</codeph> clause where one of the table references is an inline view might apply predicates + from the <codeph>ON</codeph> clause incorrectly. + </p> + + <p> + <b>Bug:</b> <xref href="https://issues.cloudera.org/browse/IMPALA-1459" scope="external" format="html">IMPALA-1459</xref> + </p> + + <p><b>Resolution:</b> Fixed in CDH 5.7.0 / Impala 2.5.0, CDH 5.5.2 / Impala 2.3.2, and CDH 5.4.9 / Impala 2.2.9.</p> + + </conbody> + + </concept> + + <concept id="IMPALA-2603" rev="IMPALA-2603"> + + <title>Crash: impala::Coordinator::ValidateCollectionSlots</title> + + <conbody> + + <p> + A query could encounter a serious error if includes multiple nested levels of <codeph>INNER JOIN</codeph> clauses involving + subqueries. + </p> + + <p> + <b>Bug:</b> <xref href="https://issues.cloudera.org/browse/IMPALA-2603" scope="external" format="html">IMPALA-2603</xref> + </p> + + </conbody> + + </concept> + + <concept id="IMPALA-2665" rev="IMPALA-2665"> + + <title>Incorrect assignment of On-clause predicate inside inline view with an outer join.</title> + + <conbody> + + <p> + A query might return incorrect results due to wrong predicate assignment in the following scenario: + </p> + + <ol> + <li> + There is an inline view that contains an outer join + </li> + + <li> + That inline view is joined with another table in the enclosing query block + </li> + + <li> + That join has an On-clause containing a predicate that only references columns originating from the outer-joined tables inside + the inline view + </li> + </ol> + + <p> + <b>Bug:</b> <xref href="https://issues.cloudera.org/browse/IMPALA-2665" scope="external" format="html">IMPALA-2665</xref> + </p> + + <p><b>Resolution:</b> Fixed in CDH 5.7.0 / Impala 2.5.0, CDH 5.5.2 / Impala 2.3.2, and CDH 5.4.9 / Impala 2.2.9.</p> + + </conbody> + + </concept> + + <concept id="IMPALA-2144" rev="IMPALA-2144"> + + <title>Wrong assignment of having clause predicate across outer join</title> + + <conbody> + + <p> + In an <codeph>OUTER JOIN</codeph> query with a <codeph>HAVING</codeph> clause, the comparison from the <codeph>HAVING</codeph> + clause might be applied at the wrong stage of query processing, leading to incorrect results. + </p> + + <p> + <b>Bug:</b> <xref href="https://issues.cloudera.org/browse/IMPALA-2144" scope="external" format="html">IMPALA-2144</xref> + </p> + + <p><b>Resolution:</b> Fixed in CDH 5.7.0 / Impala 2.5.0.</p> + + </conbody> + + </concept> + + <concept id="IMPALA-2093" rev="IMPALA-2093"> + + <title>Wrong plan of NOT IN aggregate subquery when a constant is used in subquery predicate</title> + + <conbody> + + <p> + A <codeph>NOT IN</codeph> operator with a subquery that calls an aggregate function, such as <codeph>NOT IN (SELECT + SUM(...))</codeph>, could return incorrect results. + </p> + + <p> + <b>Bug:</b> <xref href="https://issues.cloudera.org/browse/IMPALA-2093" scope="external" format="html">IMPALA-2093</xref> + </p> + + <p><b>Resolution:</b> Fixed in CDH 5.7.0 / Impala 2.5.0 and CDH 5.5.4 / Impala 2.3.4.</p> + + </conbody> + + </concept> + + </concept> + + <concept id="known_issues_metadata"> + + <title id="ki_metadata">Impala Known Issues: Metadata</title> + + <conbody> + + <p> + These issues affect how Impala interacts with metadata. They cover areas such as the metastore database, the <codeph>COMPUTE + STATS</codeph> statement, and the Impala <cmdname>catalogd</cmdname> daemon. + </p> + + </conbody> + + <concept id="IMPALA-2648" rev="IMPALA-2648"> + + <title>Catalogd may crash when loading metadata for tables with many partitions, many columns and with incremental stats</title> + + <conbody> + + <p> + Incremental stats use up about 400 bytes per partition for each column. For example, for a table with 20K partitions and 100 + columns, the memory overhead from incremental statistics is about 800 MB. When serialized for transmission across the network, + this metadata exceeds the 2 GB Java array size limit and leads to a <codeph>catalogd</codeph> crash. + </p> + + <p> + <b>Bugs:</b> <xref href="https://issues.cloudera.org/browse/IMPALA-2647" scope="external" format="html">IMPALA-2647</xref>, + <xref href="https://issues.cloudera.org/browse/IMPALA-2648" scope="external" format="html">IMPALA-2648</xref>, + <xref href="https://issues.cloudera.org/browse/IMPALA-2649" scope="external" format="html">IMPALA-2649</xref> + </p> + + <p> + <b>Workaround:</b> If feasible, compute full stats periodically and avoid computing incremental stats for that table. The + scalability of incremental stats computation is a continuing work item. + </p> + + </conbody> + + </concept> + + <concept id="IMPALA-1420" rev="IMPALA-1420 2.0.0"> + +<!-- Not part of Alex's spreadsheet --> + + <title>Can't update stats manually via alter table after upgrading to CDH 5.2</title> + + <conbody> + + <p></p> + + <p> + <b>Bug:</b> <xref href="https://issues.cloudera.org/browse/IMPALA-1420" scope="external" format="html">IMPALA-1420</xref> + </p> + + <p> + <b>Workaround:</b> On CDH 5.2, when adjusting table statistics manually by setting the <codeph>numRows</codeph>, you must also + enable the Boolean property <codeph>STATS_GENERATED_VIA_STATS_TASK</codeph>. For example, use a statement like the following to + set both properties with a single <codeph>ALTER TABLE</codeph> statement: + </p> + +<codeblock>ALTER TABLE <varname>table_name</varname> SET TBLPROPERTIES('numRows'='<varname>new_value</varname>', 'STATS_GENERATED_VIA_STATS_TASK' = 'true');</codeblock> + + <p> + <b>Resolution:</b> The underlying cause is the issue + <xref href="https://issues.apache.org/jira/browse/HIVE-8648" scope="external" format="html">HIVE-8648</xref> that affects the + metastore in Hive 0.13. The workaround is only needed until the fix for this issue is incorporated into a CDH release. + </p> + + </conbody> + + </concept> + + </concept> + + <concept id="known_issues_interop"> + + <title id="ki_interop">Impala Known Issues: Interoperability</title> + + <conbody> + + <p> + These issues affect the ability to interchange data between Impala and other database systems. They cover areas such as data types + and file formats. + </p> + + </conbody> + +<!-- Opened based on CDH-41605. Not part of Alex's spreadsheet AFAIK. --> + + <concept id="CDH-41605"> + + <title>DESCRIBE FORMATTED gives error on Avro table</title> + + <conbody> + + <p> + This issue can occur either on old Avro tables (created prior to Hive 1.1 / CDH 5.4) or when changing the Avro schema file by + adding or removing columns. Columns added to the schema file will not show up in the output of the <codeph>DESCRIBE + FORMATTED</codeph> command. Removing columns from the schema file will trigger a <codeph>NullPointerException</codeph>. + </p> + + <p> + As a workaround, you can use the output of <codeph>SHOW CREATE TABLE</codeph> to drop and recreate the table. This will populate + the Hive metastore database with the correct column definitions. + </p> + + <note type="warning"> + Only use this for external tables, or Impala will remove the data files. In case of an internal table, set it to external first: +<codeblock> +ALTER TABLE table_name SET TBLPROPERTIES('EXTERNAL'='TRUE'); +</codeblock> + (The part in parentheses is case sensitive.) Make sure to pick the right choice between internal and external when recreating the + table. See <xref href="impala_tables.xml#tables"/> for the differences between internal and external tables. + </note> + + <p audience="Cloudera"> + <b>Bug:</b> <xref href="https://issues.cloudera.org/browse/CDH-41605" scope="external" format="html">CDH-41605</xref> + </p> + + <p> + <b>Severity:</b> High + </p> + + </conbody> + + </concept> + + <concept id="IMP-469"> + +<!-- Not part of Alex's spreadsheet. Perhaps it really is a permanent limitation and nobody is tracking it? --> + + <title>Deviation from Hive behavior: Impala does not do implicit casts between string and numeric and boolean types.</title> + + <conbody> + + <p audience="Cloudera"> + <b>Cloudera Bug:</b> <xref href="https://jira.cloudera.com/browse/IMP-469" scope="external" format="html"/>; KI added 0.1 + <i>Cloudera internal only</i> + </p> + + <p> + <b>Anticipated Resolution</b>: None + </p> + + <p> + <b>Workaround:</b> Use explicit casts. + </p> + + </conbody> + + </concept> + + <concept id="IMP-175"> + +<!-- Not part of Alex's spreadsheet. Perhaps it really is a permanent limitation and nobody is tracking it? --> + + <title>Deviation from Hive behavior: Out of range values float/double values are returned as maximum allowed value of type (Hive returns NULL)</title> + + <conbody> + + <p> + Impala behavior differs from Hive with respect to out of range float/double values. Out of range values are returned as maximum + allowed value of type (Hive returns NULL). + </p> + + <p audience="Cloudera"> + <b>Cloudera Bug:</b> <xref href="https://jira.cloudera.com/browse/IMP-175" scope="external" format="html">IMPALA-175</xref> ; KI + added 0.1 <i>Cloudera internal only</i> + </p> + + <p> + <b>Workaround:</b> None + </p> + + </conbody> + + </concept> + + <concept id="CDH-13199"> + +<!-- Not part of Alex's spreadsheet. The CDH- prefix makes it an oddball. --> + + <title>Configuration needed for Flume to be compatible with Impala</title> + + <conbody> + + <p> + For compatibility with Impala, the value for the Flume HDFS Sink <codeph>hdfs.writeFormat</codeph> must be set to + <codeph>Text</codeph>, rather than its default value of <codeph>Writable</codeph>. The <codeph>hdfs.writeFormat</codeph> setting + must be changed to <codeph>Text</codeph> before creating data files with Flume; otherwise, those files cannot be read by either + Impala or Hive. + </p> + + <p> + <b>Resolution:</b> This information has been requested to be added to the upstream Flume documentation. + </p> + + </conbody> + + </concept> + + <concept id="IMPALA-635" rev="IMPALA-635"> + +<!-- Not part of Alex's spreadsheet --> + + <title>Avro Scanner fails to parse some schemas</title> + + <conbody> + + <p> + Querying certain Avro tables could cause a crash or return no rows, even though Impala could <codeph>DESCRIBE</codeph> the table. + </p> + + <p> + <b>Bug:</b> <xref href="https://issues.cloudera.org/browse/IMPALA-635" scope="external" format="html">IMPALA-635</xref> + </p> + + <p> + <b>Workaround:</b> Swap the order of the fields in the schema specification. For example, <codeph>["null", "string"]</codeph> + instead of <codeph>["string", "null"]</codeph>. + </p> + + <p> + <b>Resolution:</b> Not allowing this syntax agrees with the Avro specification, so it may still cause an error even when the + crashing issue is resolved. + </p> + + </conbody> + + </concept> + + <concept id="IMPALA-1024" rev="IMPALA-1024"> + +<!-- Not part of Alex's spreadsheet --> + + <title>Impala BE cannot parse Avro schema that contains a trailing semi-colon</title> + + <conbody> + + <p> + If an Avro table has a schema definition with a trailing semicolon, Impala encounters an error when the table is queried. + </p> + + <p> + <b>Bug:</b> <xref href="https://issues.cloudera.org/browse/IMPALA-1024" scope="external" format="html">IMPALA-1024</xref> + </p> + + <p> + <b>Severity:</b> Remove trailing semicolon from the Avro schema. + </p> + + </conbody> + + </concept> + + <concept id="IMPALA-2154" rev="IMPALA-2154"> + +<!-- Not part of Alex's spreadsheet --> + + <title>Fix decompressor to allow parsing gzips with multiple streams</title> + + <conbody> + + <p> + Currently, Impala can only read gzipped files containing a single stream. If a gzipped file contains multiple concatenated + streams, the Impala query only processes the data from the first stream. + </p> + + <p> + <b>Bug:</b> <xref href="https://issues.cloudera.org/browse/IMPALA-2154" scope="external" format="html">IMPALA-2154</xref> + </p> + + <p> + <b>Workaround:</b> Use a different gzip tool to compress file to a single stream file. + </p> + + <p><b>Resolution:</b> Fixed in CDH 5.7.0 / Impala 2.5.0.</p> + + </conbody> + + </concept> + + <concept id="IMPALA-1578" rev="IMPALA-1578"> + +<!-- Not part of Alex's spreadsheet --> + + <title>Impala incorrectly handles text data when the new line character \n\r is split between different HDFS block</title> + + <conbody> + + <p> + If a carriage return / newline pair of characters in a text table is split between HDFS data blocks, Impala incorrectly processes + the row following the <codeph>\n\r</codeph> pair twice. + </p> + + <p> + <b>Bug:</b> <xref href="https://issues.cloudera.org/browse/IMPALA-1578" scope="external" format="html">IMPALA-1578</xref> + </p> + + <p> + <b>Workaround:</b> Use the Parquet format for large volumes of data where practical. + </p> + + <p><b>Resolution:</b> Fixed in CDH 5.8.0 / Impala 2.6.0.</p> + + </conbody> + + </concept> + + <concept id="IMPALA-1862" rev="IMPALA-1862"> + +<!-- Not part of Alex's spreadsheet --> + + <title>Invalid bool value not reported as a scanner error</title> + + <conbody> + + <p> + In some cases, an invalid <codeph>BOOLEAN</codeph> value read from a table does not produce a warning message about the bad value. + The result is still <codeph>NULL</codeph> as expected. Therefore, this is not a query correctness issue, but it could lead to + overlooking the presence of invalid data. + </p> + + <p> + <b>Bug:</b> <xref href="https://issues.cloudera.org/browse/IMPALA-1862" scope="external" format="html">IMPALA-1862</xref> + </p> + + </conbody> + + </concept> + + <concept id="IMPALA-1652" rev="IMPALA-1652"> + +<!-- To do: Isn't this more a correctness issue? --> + + <title>Incorrect results with basic predicate on CHAR typed column.</title> + + <conbody> + + <p> + When comparing a <codeph>CHAR</codeph> column value to a string literal, the literal value is not blank-padded and so the + comparison might fail when it should match. + </p> + + <p> + <b>Bug:</b> <xref href="https://issues.cloudera.org/browse/IMPALA-1652" scope="external" format="html">IMPALA-1652</xref> + </p> + + <p> + <b>Workaround:</b> Use the <codeph>RPAD()</codeph> function to blank-pad literals compared with <codeph>CHAR</codeph> columns to + the expected length. + </p> + + </conbody> + + </concept> + + </concept> + + <concept id="known_issues_limitations"> + + <title>Impala Known Issues: Limitations</title> + + <conbody> + + <p> + These issues are current limitations of Impala that require evaluation as you plan how to integrate Impala into your data management + workflow. + </p> + + </conbody> + + <concept id="IMPALA-77" rev="IMPALA-77"> + +<!-- Not part of Alex's spreadsheet. Perhaps it really is a permanent limitation and nobody is tracking it? --> + + <title>Impala does not support running on clusters with federated namespaces</title> + + <conbody> + + <p> + Impala does not support running on clusters with federated namespaces. The <codeph>impalad</codeph> process will not start on a + node running such a filesystem based on the <codeph>org.apache.hadoop.fs.viewfs.ViewFs</codeph> class. + </p> + + <p> + <b>Bug:</b> <xref href="https://issues.cloudera.org/browse/IMPALA-77" scope="external" format="html">IMPALA-77</xref> + </p> + + <p> + <b>Anticipated Resolution:</b> Limitation + </p> + + <p> + <b>Workaround:</b> Use standard HDFS on all Impala nodes. + </p> + + </conbody> + + </concept> + + </concept> + + <concept id="known_issues_misc"> + + <title>Impala Known Issues: Miscellaneous / Older Issues</title> + + <conbody> + + <p> + These issues do not fall into one of the above categories or have not been categorized yet. + </p> + + </conbody> + + <concept id="IMPALA-2005" rev="IMPALA-2005"> + +<!-- Not part of Alex's spreadsheet --> + + <title>A failed CTAS does not drop the table if the insert fails.</title> + + <conbody> + + <p> + If a <codeph>CREATE TABLE AS SELECT</codeph> operation successfully creates the target table but an error occurs while querying + the source table or copying the data, the new table is left behind rather than being dropped. + </p> + + <p> + <b>Bug:</b> <xref href="https://issues.cloudera.org/browse/IMPALA-2005" scope="external" format="html">IMPALA-2005</xref> + </p> + + <p> + <b>Workaround:</b> Drop the new table manually after a failed <codeph>CREATE TABLE AS SELECT</codeph>. + </p> + + </conbody> + + </concept> + + <concept id="IMPALA-1821" rev="IMPALA-1821"> + +<!-- Not part of Alex's spreadsheet --> + + <title>Casting scenarios with invalid/inconsistent results</title> + + <conbody> + + <p> + Using a <codeph>CAST()</codeph> function to convert large literal values to smaller types, or to convert special values such as + <codeph>NaN</codeph> or <codeph>Inf</codeph>, produces values not consistent with other database systems. This could lead to + unexpected results from queries. + </p> + + <p> + <b>Bug:</b> <xref href="https://issues.cloudera.org/browse/IMPALA-1821" scope="external" format="html">IMPALA-1821</xref> + </p> + +<!-- <p><b>Workaround:</b> Doublecheck that <codeph>CAST()</codeph> operations work as expect. The issue applies to expressions involving literals, not values read from table columns.</p> --> + + </conbody> + + </concept> + + <concept id="IMPALA-1619" rev="IMPALA-1619"> + +<!-- Not part of Alex's spreadsheet --> + + <title>Support individual memory allocations larger than 1 GB</title> + + <conbody> + + <p> + The largest single block of memory that Impala can allocate during a query is 1 GiB. Therefore, a query could fail or Impala could + crash if a compressed text file resulted in more than 1 GiB of data in uncompressed form, or if a string function such as + <codeph>group_concat()</codeph> returned a value greater than 1 GiB. + </p> + + <p> + <b>Bug:</b> <xref href="https://issues.cloudera.org/browse/IMPALA-1619" scope="external" format="html">IMPALA-1619</xref> + </p> + + <p><b>Resolution:</b> Fixed in CDH 5.9.0 / Impala 2.7.0 and CDH 5.8.3 / Impala 2.6.3.</p> + + </conbody> + + </concept> + + <concept id="IMPALA-941" rev="IMPALA-941"> + +<!-- Not part of Alex's spreadsheet. Maybe this is interop? --> + + <title>Impala Parser issue when using fully qualified table names that start with a number.</title> + + <conbody> + + <p> + A fully qualified table name starting with a number could cause a parsing error. In a name such as <codeph>db.571_market</codeph>, + the decimal point followed by digits is interpreted as a floating-point number. + </p> + + <p> + <b>Bug:</b> <xref href="https://issues.cloudera.org/browse/IMPALA-941" scope="external" format="html">IMPALA-941</xref> + </p> + + <p> + <b>Workaround:</b> Surround each part of the fully qualified name with backticks (<codeph>``</codeph>). + </p> + + </conbody> + + </concept> + + <concept id="IMPALA-532" rev="IMPALA-532"> + +<!-- Not part of Alex's spreadsheet. Perhaps it really is a permanent limitation and nobody is tracking it? --> + + <title>Impala should tolerate bad locale settings</title> + + <conbody> + + <p> + If the <codeph>LC_*</codeph> environment variables specify an unsupported locale, Impala does not start. + </p> + + <p> + <b>Bug:</b> <xref href="https://issues.cloudera.org/browse/IMPALA-532" scope="external" format="html">IMPALA-532</xref> + </p> + + <p> + <b>Workaround:</b> Add <codeph>LC_ALL="C"</codeph> to the environment settings for both the Impala daemon and the Statestore + daemon. See <xref href="impala_config_options.xml#config_options"/> for details about modifying these environment settings. + </p> + + <p> + <b>Resolution:</b> Fixing this issue would require an upgrade to Boost 1.47 in the Impala distribution. + </p> + + </conbody> + + </concept> + + <concept id="IMP-1203"> + +<!-- Not part of Alex's spreadsheet. Perhaps it really is a permanent limitation and nobody is tracking it? --> + + <title>Log Level 3 Not Recommended for Impala</title> + + <conbody> + + <p> + The extensive logging produced by log level 3 can cause serious performance overhead and capacity issues. + </p> + + <p> + <b>Workaround:</b> Reduce the log level to its default value of 1, that is, <codeph>GLOG_v=1</codeph>. See + <xref href="impala_logging.xml#log_levels"/> for details about the effects of setting different logging levels. + </p> + + </conbody> + + </concept> + + </concept> + +</concept>
http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3be0f122/docs/topics/impala_kudu.xml ---------------------------------------------------------------------- diff --git a/docs/topics/impala_kudu.xml b/docs/topics/impala_kudu.xml new file mode 100644 index 0000000..c530cc1 --- /dev/null +++ b/docs/topics/impala_kudu.xml @@ -0,0 +1,167 @@ +<?xml version="1.0" encoding="UTF-8"?> +<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd"> +<concept id="impala_kudu" rev="kudu"> + + <title>Using Impala to Query Kudu Tables</title> + + <prolog> + <metadata> + <data name="Category" value="Impala"/> + <data name="Category" value="Kudu"/> + <data name="Category" value="Querying"/> + <data name="Category" value="Data Analysts"/> + <data name="Category" value="Developers"/> + </metadata> + </prolog> + + <conbody> + + <p> + <indexterm audience="Cloudera">Kudu</indexterm> + You can use Impala to query Kudu tables. This capability allows convenient access to a storage system that is + tuned for different kinds of workloads than the default with Impala. The default Impala tables use data files + stored on HDFS, which are ideal for bulk loads and queries using full-table scans. In contrast, Kudu can do + efficient queries for data organized either in data warehouse style (with full table scans) or for OLTP-style + workloads (with key-based lookups for single rows or small ranges of values). + </p> + + <p> + Certain Impala SQL statements, such as <codeph>UPDATE</codeph> and <codeph>DELETE</codeph>, only work with + Kudu tables. These operations were impractical from a performance perspective to perform at large scale on + HDFS data, or on HBase tables. + </p> + + </conbody> + + <concept id="kudu_benefits"> + + <title>Benefits of Using Kudu Tables with Impala</title> + + <conbody> + + <p> + The combination of Kudu and Impala works best for tables where scan performance is important, but data + arrives continuously, in small batches, or needs to be updated without being completely replaced. In these + scenarios (such as for streaming data), it might be impractical to use Parquet tables because Parquet works + best with multi-megabyte data files, requiring substantial overhead to replace or reorganize data files to + accomodate frequent additions or changes to data. Impala can query Kudu tables with scan performance close + to that of Parquet, and Impala can also perform update or delete operations without replacing the entire + table contents. You can also use the Kudu API to do ingestion or transformation operations outside of + Impala, and Impala can query the current data at any time. + </p> + + </conbody> + + </concept> + + <concept id="kudu_primary_key"> + + <title>Primary Key Columns for Kudu Tables</title> + + <conbody> + + <p> + Kudu tables introduce the notion of primary keys to Impala for the first time. The primary key is made up + of one or more columns, whose values are combined and used as a lookup key during queries. These columns + cannot contain any <codeph>NULL</codeph> values or any duplicate values, and can never be updated. For a + partitioned Kudu table, all the partition key columns must come from the set of primary key columns. + </p> + + <p> + Impala itself still does not have the notion of unique or non-<codeph>NULL</codeph> constraints. These + restrictions on the primary key columns are enforced on the Kudu side. + </p> + + <p> + The primary key columns must be the first ones specified in the <codeph>CREATE TABLE</codeph> statement. + You specify which column or columns make up the primary key in the table properties, rather than through + attributes in the column list. + </p> + + <p> + Kudu can do extra optimizations for queries that refer to the primary key columns in the + <codeph>WHERE</codeph> clause. It is not crucial though to include the primary key columns in the + <codeph>WHERE</codeph> clause of every query. The benefit is mainly for partitioned tables, + which divide the data among various tablet servers based on the distribution of + data values in some or all of the primary key columns. + </p> + + </conbody> + + </concept> + + <concept id="kudu_dml"> + + <title>Impala DML Support for Kudu Tables</title> + + <conbody> + + <p> + Impala supports certain DML statements for Kudu tables only. The <codeph>UPDATE</codeph> and + <codeph>DELETE</codeph> statements let you modify data within Kudu tables without rewriting substantial + amounts of table data. + </p> + + <p> + The <codeph>INSERT</codeph> statement for Kudu tables honors the unique and non-<codeph>NULL</codeph> + requirements for the primary key columns. + </p> + + <p> + Because Impala and Kudu do not support transactions, the effects of any <codeph>INSERT</codeph>, + <codeph>UPDATE</codeph>, or <codeph>DELETE</codeph> statement are immediately visible. For example, you + cannot do a sequence of <codeph>UPDATE</codeph> statements and only make the change visible after all the + statements are finished. Also, if a DML statement fails partway through, any rows that were already + inserted, deleted, or changed remain in the table; there is no rollback mechanism to undo the changes. + </p> + + </conbody> + + </concept> + + <concept id="kudu_partitioning"> + + <title>Partitioning for Kudu Tables</title> + + <conbody> + + <p> + Kudu tables use special mechanisms to evenly distribute data among the underlying tablet servers. Although + we refer to such tables as partitioned tables, they are distinguished from traditional Impala partitioned + tables by use of different clauses on the <codeph>CREATE TABLE</codeph> statement. Partitioned Kudu tables + use <codeph>DISTRIBUTE BY</codeph>, <codeph>HASH</codeph>, <codeph>RANGE</codeph>, and <codeph>SPLIT + ROWS</codeph> clauses rather than the traditional <codeph>PARTITIONED BY</codeph> clause. All of the + columns involved in these clauses must be primary key columns. These clauses let you specify different ways + to divide the data for each column, or even for different value ranges within a column. This flexibility + lets you avoid problems with uneven distribution of data, where the partitioning scheme for HDFS tables + might result in some partitions being much larger than others. By setting up an effective partitioning + scheme for a Kudu table, you can ensure that the work for a query can be parallelized evenly across the + hosts in a cluster. + </p> + + </conbody> + + </concept> + + <concept id="kudu_performance"> + + <title>Impala Query Performance for Kudu Tables</title> + + <conbody> + + <p> + For queries involving Kudu tables, Impala can delegate much of the work of filtering the result set to + Kudu, avoiding some of the I/O involved in full table scans of tables containing HDFS data files. This type + of optimization is especially effective for partitioned Kudu tables, where the Impala query + <codeph>WHERE</codeph> clause refers to one or more primary key columns that are also used as partition key + columns. For example, if a partitioned Kudu table uses a <codeph>HASH</codeph> clause for + <codeph>col1</codeph> and a <codeph>RANGE</codeph> clause for <codeph>col2</codeph>, a query using a clause + such as <codeph>WHERE col1 IN (1,2,3) AND col2 > 100</codeph> can determine exactly which tablet servers + contain relevant data, and therefore parallelize the query very efficiently. + </p> + + </conbody> + + </concept> + +</concept> http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3be0f122/docs/topics/impala_langref.xml ---------------------------------------------------------------------- diff --git a/docs/topics/impala_langref.xml b/docs/topics/impala_langref.xml new file mode 100644 index 0000000..f81b76f --- /dev/null +++ b/docs/topics/impala_langref.xml @@ -0,0 +1,74 @@ +<?xml version="1.0" encoding="UTF-8"?> +<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd"> +<concept id="langref"> + + <title>Impala SQL Language Reference</title> + <titlealts audience="PDF"><navtitle>SQL Reference</navtitle></titlealts> + <prolog> + <metadata> + <data name="Category" value="Impala"/> + <data name="Category" value="SQL"/> + <data name="Category" value="Data Analysts"/> + <data name="Category" value="Developers"/> + <data name="Category" value="impala-shell"/> + </metadata> + </prolog> + + <conbody> + + <p> + Impala uses SQL as its query language. To protect user investment in skills development and query + design, Impala provides a high degree of compatibility with the Hive Query Language (HiveQL): + </p> + + <ul> + <li> + Because Impala uses the same metadata store as Hive to record information about table structure and + properties, Impala can access tables defined through the native Impala <codeph>CREATE TABLE</codeph> + command, or tables created using the Hive data definition language (DDL). + </li> + + <li> + Impala supports data manipulation (DML) statements similar to the DML component of HiveQL. + </li> + + <li> + Impala provides many <xref href="impala_functions.xml#builtins">built-in functions</xref> with the same + names and parameter types as their HiveQL equivalents. + </li> + </ul> + + <p> + Impala supports most of the same <xref href="impala_langref_sql.xml#langref_sql">statements and + clauses</xref> as HiveQL, including, but not limited to <codeph>JOIN</codeph>, <codeph>AGGREGATE</codeph>, + <codeph>DISTINCT</codeph>, <codeph>UNION ALL</codeph>, <codeph>ORDER BY</codeph>, <codeph>LIMIT</codeph> and + (uncorrelated) subquery in the <codeph>FROM</codeph> clause. Impala also supports <codeph>INSERT + INTO</codeph> and <codeph>INSERT OVERWRITE</codeph>. + </p> + + <p> + Impala supports data types with the same names and semantics as the equivalent Hive data types: + <codeph>STRING</codeph>, <codeph>TINYINT</codeph>, <codeph>SMALLINT</codeph>, <codeph>INT</codeph>, + <codeph>BIGINT</codeph>, <codeph>FLOAT</codeph>, <codeph>DOUBLE</codeph>, <codeph>BOOLEAN</codeph>, + <codeph>STRING</codeph>, <codeph>TIMESTAMP</codeph>. + </p> + + <p> + For full details about Impala SQL syntax and semantics, see + <xref href="impala_langref_sql.xml#langref_sql"/>. + </p> + + <p> + Most HiveQL <codeph>SELECT</codeph> and <codeph>INSERT</codeph> statements run unmodified with Impala. For + information about Hive syntax not available in Impala, see + <xref href="impala_langref_unsupported.xml#langref_hiveql_delta"/>. + </p> + + <p> + For a list of the built-in functions available in Impala queries, see + <xref href="impala_functions.xml#builtins"/>. + </p> + + <p outputclass="toc"/> + </conbody> +</concept> http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3be0f122/docs/topics/impala_langref_sql.xml ---------------------------------------------------------------------- diff --git a/docs/topics/impala_langref_sql.xml b/docs/topics/impala_langref_sql.xml new file mode 100644 index 0000000..18b6726 --- /dev/null +++ b/docs/topics/impala_langref_sql.xml @@ -0,0 +1,35 @@ +<?xml version="1.0" encoding="UTF-8"?> +<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd"> +<concept id="langref_sql"> + + <title>Impala SQL Statements</title> + <titlealts audience="PDF"><navtitle>SQL Statements</navtitle></titlealts> + <prolog> + <metadata> + <data name="Category" value="Impala"/> + <data name="Category" value="SQL"/> + <data name="Category" value="Developers"/> + <data name="Category" value="Data Analysts"/> + </metadata> + </prolog> + + <conbody> + + <p> + The Impala SQL dialect supports a range of standard elements, plus some extensions for Big Data use cases + related to data loading and data warehousing. + </p> + + <note> + <p> + In the <cmdname>impala-shell</cmdname> interpreter, a semicolon at the end of each statement is required. + Since the semicolon is not actually part of the SQL syntax, we do not include it in the syntax definition + of each statement, but we do show it in examples intended to be run in <cmdname>impala-shell</cmdname>. + </p> + </note> + + <p audience="PDF" outputclass="toc all"> + The following sections show the major SQL statements that you work with in Impala: + </p> + </conbody> +</concept> http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/3be0f122/docs/topics/impala_langref_unsupported.xml ---------------------------------------------------------------------- diff --git a/docs/topics/impala_langref_unsupported.xml b/docs/topics/impala_langref_unsupported.xml new file mode 100644 index 0000000..82910d6 --- /dev/null +++ b/docs/topics/impala_langref_unsupported.xml @@ -0,0 +1,312 @@ +<?xml version="1.0" encoding="UTF-8"?> +<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd"> +<concept id="langref_hiveql_delta"> + + <title>SQL Differences Between Impala and Hive</title> + <prolog> + <metadata> + <data name="Category" value="Impala"/> + <data name="Category" value="SQL"/> + <data name="Category" value="Hive"/> + <data name="Category" value="Porting"/> + <data name="Category" value="Data Analysts"/> + <data name="Category" value="Developers"/> + </metadata> + </prolog> + + <conbody> + + <p> + <indexterm audience="Cloudera">Hive</indexterm> + <indexterm audience="Cloudera">HiveQL</indexterm> + Impala's SQL syntax follows the SQL-92 standard, and includes many industry extensions in areas such as + built-in functions. See <xref href="impala_porting.xml#porting"/> for a general discussion of adapting SQL + code from a variety of database systems to Impala. + </p> + + <p> + Because Impala and Hive share the same metastore database and their tables are often used interchangeably, + the following section covers differences between Impala and Hive in detail. + </p> + + <p outputclass="toc inpage"/> + </conbody> + + <concept id="langref_hiveql_unsupported"> + + <title>HiveQL Features not Available in Impala</title> + + <conbody> + + <p> + The current release of Impala does not support the following SQL features that you might be familiar with + from HiveQL: + </p> + + <!-- To do: + Yeesh, too many separate lists of unsupported Hive syntax. + Here, the FAQ, and in some of the intro topics. + Some discussion in IMP-1061 about how best to reorg. + Lots of opportunities for conrefs. + --> + + <ul> +<!-- Now supported in <keyword keyref="impala23_full"/> and higher. Find places on this page (like already done under lateral views) to note the new data type support. + <li> + Non-scalar data types such as maps, arrays, structs. + </li> +--> + + <li rev="1.2"> + Extensibility mechanisms such as <codeph>TRANSFORM</codeph>, custom file formats, or custom SerDes. + </li> + + <li rev="CDH-41376"> + The <codeph>DATE</codeph> data type. + </li> + + <li> + XML and JSON functions. + </li> + + <li> + Certain aggregate functions from HiveQL: <codeph>covar_pop</codeph>, <codeph>covar_samp</codeph>, + <codeph>corr</codeph>, <codeph>percentile</codeph>, <codeph>percentile_approx</codeph>, + <codeph>histogram_numeric</codeph>, <codeph>collect_set</codeph>; Impala supports the set of aggregate + functions listed in <xref href="impala_aggregate_functions.xml#aggregate_functions"/> and analytic + functions listed in <xref href="impala_analytic_functions.xml#analytic_functions"/>. + </li> + + <li> + Sampling. + </li> + + <li> + Lateral views. In <keyword keyref="impala23_full"/> and higher, Impala supports queries on complex types + (<codeph>STRUCT</codeph>, <codeph>ARRAY</codeph>, or <codeph>MAP</codeph>), using join notation + rather than the <codeph>EXPLODE()</codeph> keyword. + See <xref href="impala_complex_types.xml#complex_types"/> for details about Impala support for complex types. + </li> + + <li> + Multiple <codeph>DISTINCT</codeph> clauses per query, although Impala includes some workarounds for this + limitation. + <note conref="../shared/impala_common.xml#common/multiple_count_distinct"/> + </li> + </ul> + + <p> + User-defined functions (UDFs) are supported starting in Impala 1.2. See <xref href="impala_udf.xml#udfs"/> + for full details on Impala UDFs. + <ul> + <li> + <p> + Impala supports high-performance UDFs written in C++, as well as reusing some Java-based Hive UDFs. + </p> + </li> + + <li> + <p> + Impala supports scalar UDFs and user-defined aggregate functions (UDAFs). Impala does not currently + support user-defined table generating functions (UDTFs). + </p> + </li> + + <li> + <p> + Only Impala-supported column types are supported in Java-based UDFs. + </p> + </li> + + <li> + <p conref="../shared/impala_common.xml#common/current_user_caveat"/> + </li> + </ul> + </p> + + <p> + Impala does not currently support these HiveQL statements: + </p> + + <ul> + <li> + <codeph>ANALYZE TABLE</codeph> (the Impala equivalent is <codeph>COMPUTE STATS</codeph>) + </li> + + <li> + <codeph>DESCRIBE COLUMN</codeph> + </li> + + <li> + <codeph>DESCRIBE DATABASE</codeph> + </li> + + <li> + <codeph>EXPORT TABLE</codeph> + </li> + + <li> + <codeph>IMPORT TABLE</codeph> + </li> + + <li> + <codeph>SHOW TABLE EXTENDED</codeph> + </li> + + <li> + <codeph>SHOW INDEXES</codeph> + </li> + + <li> + <codeph>SHOW COLUMNS</codeph> + </li> + + <li rev="DOCS-656"> + <codeph>INSERT OVERWRITE DIRECTORY</codeph>; use <codeph>INSERT OVERWRITE <varname>table_name</varname></codeph> + or <codeph>CREATE TABLE AS SELECT</codeph> to materialize query results into the HDFS directory associated + with an Impala table. + </li> + </ul> + </conbody> + </concept> + + <concept id="langref_hiveql_semantics"> + + <title>Semantic Differences Between Impala and HiveQL Features</title> + + <conbody> + + <p> + This section covers instances where Impala and Hive have similar functionality, sometimes including the + same syntax, but there are differences in the runtime semantics of those features. + </p> + + <p> + <b>Security:</b> + </p> + + <p> + Impala utilizes the <xref href="http://sentry.incubator.apache.org/" scope="external" format="html">Apache + Sentry </xref> authorization framework, which provides fine-grained role-based access control + to protect data against unauthorized access or tampering. + </p> + + <p> + The Hive component included in <ph rev="upstream">CDH 5.1</ph> and higher now includes Sentry-enabled <codeph>GRANT</codeph>, + <codeph>REVOKE</codeph>, and <codeph>CREATE/DROP ROLE</codeph> statements. Earlier Hive releases had a + privilege system with <codeph>GRANT</codeph> and <codeph>REVOKE</codeph> statements that were primarily + intended to prevent accidental deletion of data, rather than a security mechanism to protect against + malicious users. + </p> + + <p> + Impala can make use of privileges set up through Hive <codeph>GRANT</codeph> and <codeph>REVOKE</codeph> statements. + Impala has its own <codeph>GRANT</codeph> and <codeph>REVOKE</codeph> statements in Impala 2.0 and higher. + See <xref href="impala_authorization.xml#authorization"/> for the details of authorization in Impala, including + how to switch from the original policy file-based privilege model to the Sentry service using privileges + stored in the metastore database. + </p> + + <p> + <b>SQL statements and clauses:</b> + </p> + + <p> + The semantics of Impala SQL statements varies from HiveQL in some cases where they use similar SQL + statement and clause names: + </p> + + <ul> + <li> + Impala uses different syntax and names for query hints, <codeph>[SHUFFLE]</codeph> and + <codeph>[NOSHUFFLE]</codeph> rather than <codeph>MapJoin</codeph> or <codeph>StreamJoin</codeph>. See + <xref href="impala_joins.xml#joins"/> for the Impala details. + </li> + + <li> + Impala does not expose MapReduce specific features of <codeph>SORT BY</codeph>, <codeph>DISTRIBUTE + BY</codeph>, or <codeph>CLUSTER BY</codeph>. + </li> + + <li> + Impala does not require queries to include a <codeph>FROM</codeph> clause. + </li> + </ul> + + <p> + <b>Data types:</b> + </p> + + <ul> + <li> + Impala supports a limited set of implicit casts. This can help avoid undesired results from unexpected + casting behavior. + <ul> + <li> + Impala does not implicitly cast between string and numeric or Boolean types. Always use + <codeph>CAST()</codeph> for these conversions. + </li> + + <li> + Impala does perform implicit casts among the numeric types, when going from a smaller or less precise + type to a larger or more precise one. For example, Impala will implicitly convert a + <codeph>SMALLINT</codeph> to a <codeph>BIGINT</codeph> or <codeph>FLOAT</codeph>, but to convert from + <codeph>DOUBLE</codeph> to <codeph>FLOAT</codeph> or <codeph>INT</codeph> to <codeph>TINYINT</codeph> + requires a call to <codeph>CAST()</codeph> in the query. + </li> + + <li> + Impala does perform implicit casts from string to timestamp. Impala has a restricted set of literal + formats for the <codeph>TIMESTAMP</codeph> data type and the <codeph>from_unixtime()</codeph> format + string; see <xref href="impala_timestamp.xml#timestamp"/> for details. + </li> + </ul> + <p> + See <xref href="impala_datatypes.xml#datatypes"/> for full details on implicit and explicit casting for + all types, and <xref href="impala_conversion_functions.xml#conversion_functions"/> for details about + the <codeph>CAST()</codeph> function. + </p> + </li> + + <li> + Impala does not store or interpret timestamps using the local timezone, to avoid undesired results from + unexpected time zone issues. Timestamps are stored and interpreted relative to UTC. This difference can + produce different results for some calls to similarly named date/time functions between Impala and Hive. + See <xref href="impala_datetime_functions.xml#datetime_functions"/> for details about the Impala + functions. See <xref href="impala_timestamp.xml#timestamp"/> for a discussion of how Impala handles + time zones, and configuration options you can use to make Impala match the Hive behavior more closely + when dealing with Parquet-encoded <codeph>TIMESTAMP</codeph> data or when converting between + the local time zone and UTC. + </li> + + <li> + The Impala <codeph>TIMESTAMP</codeph> type can represent dates ranging from 1400-01-01 to 9999-12-31. + This is different from the Hive date range, which is 0000-01-01 to 9999-12-31. + </li> + + <li> + <p conref="../shared/impala_common.xml#common/int_overflow_behavior"/> + </li> + + </ul> + + <p> + <b>Miscellaneous features:</b> + </p> + + <ul> + <li> + Impala does not provide virtual columns. + </li> + + <li> + Impala does not expose locking. + </li> + + <li> + Impala does not expose some configuration properties. + </li> + </ul> + </conbody> + </concept> +</concept>
