Repository: incubator-impala Updated Branches: refs/heads/master ec957456d -> 1e581a66d
IMPALA-3316: [DOCS] Add known issue for timezone conversion slowdown Change-Id: I9933ced07e339d589f7f74173cfebe938084e65c Reviewed-on: http://gerrit.cloudera.org:8080/8165 Reviewed-by: Tim Armstrong <[email protected]> Reviewed-by: Alex Behm <[email protected]> Tested-by: Impala Public Jenkins Project: http://git-wip-us.apache.org/repos/asf/incubator-impala/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-impala/commit/1e581a66 Tree: http://git-wip-us.apache.org/repos/asf/incubator-impala/tree/1e581a66 Diff: http://git-wip-us.apache.org/repos/asf/incubator-impala/diff/1e581a66 Branch: refs/heads/master Commit: 1e581a66dddae5b400e50e440063d16de868bb63 Parents: ec95745 Author: John Russell <[email protected]> Authored: Thu Sep 28 10:36:39 2017 -0700 Committer: Impala Public Jenkins <[email protected]> Committed: Fri Oct 6 04:42:15 2017 +0000 ---------------------------------------------------------------------- docs/topics/impala_known_issues.xml | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/1e581a66/docs/topics/impala_known_issues.xml ---------------------------------------------------------------------- diff --git a/docs/topics/impala_known_issues.xml b/docs/topics/impala_known_issues.xml index 14ff4e3..28196f5 100644 --- a/docs/topics/impala_known_issues.xml +++ b/docs/topics/impala_known_issues.xml @@ -305,6 +305,32 @@ https://issues.apache.org/jira/browse/IMPALA-2144 - Don't have </conbody> + <concept id="IMPALA-3316"> + <title>Slow queries for Parquet tables with convert_legacy_hive_parquet_utc_timestamps=true</title> + <conbody> + <p> + The configuration setting <codeph>convert_legacy_hive_parquet_utc_timestamps=true</codeph> + uses an underlying function that can be a bottleneck on high volume, highly concurrent + queries due to the use of a global lock while loading time zone information. This bottleneck + can cause slowness when querying Parquet tables, up to 30x for scan-heavy queries. The amount + of slowdown depends on factors such as the number of cores and number of threads involved in the query. + </p> + <note> + <p> + The slowdown only occurs when accessing <codeph>TIMESTAMP</codeph> columns within Parquet files that + were generated by Hive, and therefore require the on-the-fly timezone conversion processing. + </p> + </note> + <p><b>Bug:</b> <xref keyref="IMPALA-3316">IMPALA-3316</xref></p> + <p><b>Severity:</b> High</p> + <p><b>Workaround:</b> If the <codeph>TIMESTAMP</codeph> values stored in the table represent dates only, + with no time portion, consider storing them as strings in <codeph>yyyy-MM-dd</codeph> format. + Impala implicitly converts such string values to <codeph>TIMESTAMP</codeph> in calls to date/time + functions. + </p> + </conbody> + </concept> + <concept id="IMPALA-1480" rev="IMPALA-1480"> <!-- Not part of Alex's spreadsheet. Spreadsheet has IMPALA-1423 which mentions it's similar to this one but not a duplicate. -->
