Repository: impala Updated Branches: refs/heads/master b206aeb71 -> a8f8c8d6f
[DOCS] Known Issues for 3.1 (WIP) Change-Id: I247438f28835c1986deca39e98cd7deb4dc20351 Reviewed-on: http://gerrit.cloudera.org:8080/11323 Reviewed-by: Alex Rodoni <[email protected]> Tested-by: Alex Rodoni <[email protected]> Project: http://git-wip-us.apache.org/repos/asf/impala/repo Commit: http://git-wip-us.apache.org/repos/asf/impala/commit/71b36fe0 Tree: http://git-wip-us.apache.org/repos/asf/impala/tree/71b36fe0 Diff: http://git-wip-us.apache.org/repos/asf/impala/diff/71b36fe0 Branch: refs/heads/master Commit: 71b36fe08bedb2d9129e2d8d0560f0bc0a055e3e Parents: b206aeb Author: Alex Rodoni <[email protected]> Authored: Fri Aug 24 15:25:50 2018 -0700 Committer: Alex Rodoni <[email protected]> Committed: Fri Aug 24 22:33:37 2018 +0000 ---------------------------------------------------------------------- docs/topics/impala_known_issues.xml | 166 ------------------------------- 1 file changed, 166 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/impala/blob/71b36fe0/docs/topics/impala_known_issues.xml ---------------------------------------------------------------------- diff --git a/docs/topics/impala_known_issues.xml b/docs/topics/impala_known_issues.xml index 5b02538..662132a 100644 --- a/docs/topics/impala_known_issues.xml +++ b/docs/topics/impala_known_issues.xml @@ -154,116 +154,6 @@ under the License. </concept> - <concept id="IMPALA-3316"> - - <title>Slow queries for Parquet tables with convert_legacy_hive_parquet_utc_timestamps=true</title> - - <conbody> - - <p> - The configuration setting - <codeph>convert_legacy_hive_parquet_utc_timestamps=true</codeph> uses an underlying - function that can be a bottleneck on high volume, highly concurrent queries due to the - use of a global lock while loading time zone information. This bottleneck can cause - slowness when querying Parquet tables, up to 30x for scan-heavy queries. The amount of - slowdown depends on factors such as the number of cores and number of threads involved - in the query. - </p> - - <note> - <p> - The slowdown only occurs when accessing <codeph>TIMESTAMP</codeph> columns within - Parquet files that were generated by Hive, and therefore require the on-the-fly - timezone conversion processing. - </p> - </note> - - <p> - <b>Bug:</b> <xref keyref="IMPALA-3316">IMPALA-3316</xref> - </p> - - <p> - <b>Severity:</b> High - </p> - - <p> - <b>Workaround:</b>Store the <codeph>TIMESTAMP</codeph> values as - strings in one of the following formats: - <ul> - <li><codeph>yyyy-MM-dd</codeph></li> - <li><codeph>yyyy-MM-dd HH:mm:ss</codeph></li> - <li><codeph>yyyy-MM-dd HH:mm:ss.SSSSSSSSS</codeph> - <p>The date can - have the 1-9 digits in the fractional part. - </p> - </li> - </ul> - Impala implicitly converts such string values to - <codeph>TIMESTAMP</codeph> in calls to date/time functions. - </p> - - </conbody> - - </concept> - - <concept id="ki_file_handle_cache"> - - <title>Interaction of File Handle Cache with HDFS Appends and Short-Circuit Reads</title> - - <conbody> - - <p> - If a data file used by Impala is being continuously appended or overwritten in place - by an HDFS mechanism, such as <cmdname>hdfs dfs -appendToFile</cmdname>, interaction - with the file handle caching feature in <keyword keyref="impala210_full"/> and higher - could cause short-circuit reads to sometimes be disabled on some DataNodes. When a - mismatch is detected between the cached file handle and a data block that was - rewritten because of an append, short-circuit reads are turned off on the affected - host for a 10-minute period. - </p> - - <p> - The possibility of encountering such an issue is the reason why the file handle - caching feature is currently turned off by default. See - <xref keyref="scalability_file_handle_cache"/> for information about this feature and - how to enable it. - </p> - - <p> - <b>Bug:</b> - <xref href="https://issues.apache.org/jira/browse/HDFS-12528" - scope="external" format="html">HDFS-12528</xref> - </p> - - <p> - <b>Severity:</b> High - </p> - - <p> - <b>Workaround:</b> Verify whether your ETL process is susceptible to this issue before - enabling the file handle caching feature. You can set the <cmdname>impalad</cmdname> - configuration option <codeph>unused_file_handle_timeout_sec</codeph> to a time period - that is shorter than the HDFS setting - <codeph>dfs.client.read.shortcircuit.streams.cache.expiry.ms</codeph>. (Keep in mind - that the HDFS setting is in milliseconds while the Impala setting is in seconds.) - </p> - - <p> - <b>Resolution:</b> Fixed in HDFS 2.10 and higher. Use the new HDFS parameter - <codeph>dfs.domain.socket.disable.interval.seconds</codeph> to specify the amount of - time that short circuit reads are disabled on encountering an error. The default value - is 10 minutes (<codeph>600</codeph> seconds). It is recommended that you set - <codeph>dfs.domain.socket.disable.interval.seconds</codeph> to a small value, such as - <codeph>1</codeph> second, when using the file handle cache. Setting <codeph> - dfs.domain.socket.disable.interval.seconds</codeph> to <codeph>0</codeph> is not - recommended as a non-zero interval protects the system if there is a persistent - problem with short circuit reads. - </p> - - </conbody> - - </concept> - </concept> <!--<concept id="known_issues_usability"><title id="ki_usability">Impala Known Issues: Usability</title><conbody><p> These issues affect the convenience of interacting directly with Impala, typically through the Impala shell or Hue. </p></conbody></concept>--> @@ -572,62 +462,6 @@ explain SELECT 1 FROM alltypestiny a1 </concept> - <concept id="IMPALA-3006" rev="IMPALA-3006"> - - <title>Impala may use incorrect bit order with BIT_PACKED encoding</title> - - <conbody> - - <p> - Parquet <codeph>BIT_PACKED</codeph> encoding as implemented by Impala is LSB first. - The parquet standard says it is MSB first. - </p> - - <p> - <b>Bug:</b> <xref keyref="IMPALA-3006">IMPALA-3006</xref> - </p> - - <p> - <b>Severity:</b> High, but rare in practice because BIT_PACKED is infrequently used, - is not written by Impala, and is deprecated in Parquet 2.0. - </p> - - </conbody> - - </concept> - - <concept id="IMPALA-3082" rev="IMPALA-3082"> - - <title>BST between 1972 and 1995</title> - - <conbody> - - <p> - The calculation of start and end times for the BST (British Summer Time) time zone - could be incorrect between 1972 and 1995. Between 1972 and 1995, BST began and ended - at 02:00 GMT on the third Sunday in March (or second Sunday when Easter fell on the - third) and fourth Sunday in October. For example, both function calls should return - 13, but actually return 12, in a query such as: - </p> - -<codeblock> -select - extract(from_utc_timestamp(cast('1970-01-01 12:00:00' as timestamp), 'Europe/London'), "hour") summer70start, - extract(from_utc_timestamp(cast('1970-12-31 12:00:00' as timestamp), 'Europe/London'), "hour") summer70end; -</codeblock> - - <p> - <b>Bug:</b> <xref keyref="IMPALA-3082">IMPALA-3082</xref> - </p> - - <p> - <b>Severity:</b> High - </p> - - </conbody> - - </concept> - <concept id="IMPALA-2422" rev="IMPALA-2422"> <title>% escaping does not work correctly when occurs at the end in a LIKE clause</title>
