[
https://issues.apache.org/jira/browse/DRILL-5034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15840272#comment-15840272
]
ASF GitHub Bot commented on DRILL-5034:
---------------------------------------
Github user bitblender commented on a diff in the pull request:
https://github.com/apache/drill/pull/656#discussion_r98070065
--- Diff:
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetReaderUtility.java
---
@@ -323,18 +323,28 @@ public static DateCorruptionStatus
checkForCorruptDateValuesInStatistics(Parquet
* @param binaryTimeStampValue
* hive, impala timestamp values with nanoseconds precision
* are stored in parquet Binary as INT96 (12 constant bytes)
- *
+ * @param retainLocalTimezone
+ * parquet files don't keep local timeZone according to the
+ * <a
href="https://github.com/Parquet/parquet-format/blob/master/LogicalTypes.md#timestamp">Parquet
spec</a>,
+ * but some tools (hive, for example) retain local timezone for
parquet files by default
+ * Note: Impala doesn't retain local timezone by default
* @return Unix Timestamp - the number of milliseconds since January 1,
1970, 00:00:00 GMT
* represented by @param binaryTimeStampValue .
*/
- public static long getDateTimeValueFromBinary(Binary
binaryTimeStampValue) {
+ public static long getDateTimeValueFromBinary(Binary
binaryTimeStampValue, boolean retainLocalTimezone) {
// This method represents binaryTimeStampValue as ByteBuffer, where
timestamp is stored as sum of
// julian day number (32-bit) and nanos of day (64-bit)
NanoTime nt = NanoTime.fromBinary(binaryTimeStampValue);
int julianDay = nt.getJulianDay();
long nanosOfDay = nt.getTimeOfDayNanos();
- return (julianDay - JULIAN_DAY_NUMBER_FOR_UNIX_EPOCH) *
DateTimeConstants.MILLIS_PER_DAY
+ long dateTime = (julianDay - JULIAN_DAY_NUMBER_FOR_UNIX_EPOCH) *
DateTimeConstants.MILLIS_PER_DAY
+ nanosOfDay / NANOS_PER_MILLISECOND;
+ if (retainLocalTimezone) {
+ return new org.joda.time.DateTime(dateTime,
org.joda.time.chrono.JulianChronology.getInstance())
+
.withZoneRetainFields(org.joda.time.DateTimeZone.UTC).getMillis();
--- End diff --
Trying to understand this: Why are you calling
.withZoneRetainFields(org.joda.time.DateTimeZone.UTC) if retainLocalTimezone is
true ?
> Select timestamp from hive generated parquet always return in UTC
> -----------------------------------------------------------------
>
> Key: DRILL-5034
> URL: https://issues.apache.org/jira/browse/DRILL-5034
> Project: Apache Drill
> Issue Type: Bug
> Components: Storage - Parquet
> Affects Versions: 1.9.0
> Reporter: Krystal
> Assignee: Vitalii Diravka
>
> commit id: 5cea9afa6278e21574c6a982ae5c3d82085ef904
> Reading timestamp data against a hive parquet table from drill automatically
> converts the timestamp data to UTC.
> {code}
> SELECT TIMEOFDAY() FROM (VALUES(1));
> +----------------------------------------------+
> | EXPR$0 |
> +----------------------------------------------+
> | 2016-11-10 12:33:26.547 America/Los_Angeles |
> +----------------------------------------------+
> {code}
> data schema:
> {code}
> message hive_schema {
> optional int32 voter_id;
> optional binary name (UTF8);
> optional int32 age;
> optional binary registration (UTF8);
> optional fixed_len_byte_array(3) contributions (DECIMAL(6,2));
> optional int32 voterzone;
> optional int96 create_timestamp;
> optional int32 create_date (DATE);
> }
> {code}
> Using drill-1.8, the returned timestamps match the table data:
> {code}
> select convert_from(create_timestamp, 'TIMESTAMP_IMPALA') from
> `/user/hive/warehouse/voter_hive_parquet` limit 5;
> +------------------------+
> | EXPR$0 |
> +------------------------+
> | 2016-10-23 20:03:58.0 |
> | null |
> | 2016-09-09 12:01:18.0 |
> | 2017-03-06 20:35:55.0 |
> | 2017-01-20 22:32:43.0 |
> +------------------------+
> 5 rows selected (1.032 seconds)
> {code}
> If the user timzone is changed to UTC, then the timestamp data is returned in
> UTC time.
> Using drill-1.9, the returned timestamps got converted to UTC eventhough the
> user timezone is in PST.
> {code}
> select convert_from(create_timestamp, 'TIMESTAMP_IMPALA') from
> dfs.`/user/hive/warehouse/voter_hive_parquet` limit 5;
> +------------------------+
> | EXPR$0 |
> +------------------------+
> | 2016-10-24 03:03:58.0 |
> | null |
> | 2016-09-09 19:01:18.0 |
> | 2017-03-07 04:35:55.0 |
> | 2017-01-21 06:32:43.0 |
> +------------------------+
> {code}
> {code}
> alter session set `store.parquet.reader.int96_as_timestamp`=true;
> +-------+---------------------------------------------------+
> | ok | summary |
> +-------+---------------------------------------------------+
> | true | store.parquet.reader.int96_as_timestamp updated. |
> +-------+---------------------------------------------------+
> select create_timestamp from dfs.`/user/hive/warehouse/voter_hive_parquet`
> limit 5;
> +------------------------+
> | create_timestamp |
> +------------------------+
> | 2016-10-24 03:03:58.0 |
> | null |
> | 2016-09-09 19:01:18.0 |
> | 2017-03-07 04:35:55.0 |
> | 2017-01-21 06:32:43.0 |
> +------------------------+
> {code}
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)