[
https://issues.apache.org/jira/browse/DRILL-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15553744#comment-15553744
]
ASF GitHub Bot commented on DRILL-4373:
---------------------------------------
Github user bitblender commented on a diff in the pull request:
https://github.com/apache/drill/pull/600#discussion_r82314071
--- Diff:
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetReaderUtility.java
---
@@ -45,4 +53,34 @@ public static int getIntFromLEBytes(byte[] input, int
start) {
}
return out;
}
+
+ /**
+ * Utilities for converting from parquet INT96 binary (impala, hive
timestamp)
+ * to date time value. This utilizes the Joda library.
+ */
+ public static class NanoTimeUtils {
+
+ public static final long NANOS_PER_DAY = TimeUnit.DAYS.toNanos(1);
+ public static final long NANOS_PER_HOUR = TimeUnit.HOURS.toNanos(1);
+ public static final long NANOS_PER_MINUTE =
TimeUnit.MINUTES.toNanos(1);
+ public static final long NANOS_PER_SECOND =
TimeUnit.SECONDS.toNanos(1);
+ public static final long NANOS_PER_MILLISECOND =
TimeUnit.MILLISECONDS.toNanos(1);
+
+ /**
+ * @param binaryTimeStampValue
+ * hive, impala timestamp values with nanoseconds precision
+ * are stored in parquet Binary as INT96
+ *
+ * @return the number of milliseconds since January 1, 1970, 00:00:00
GMT
+ * represented by @param binaryTimeStampValue .
+ */
+ public static long getDateTimeValueFromBinary(Binary
binaryTimeStampValue) {
+ NanoTime nt = NanoTime.fromBinary(binaryTimeStampValue);
+ int julianDay = nt.getJulianDay();
+ long nanosOfDay = nt.getTimeOfDayNanos();
+ return DateTimeUtils.fromJulianDay(julianDay-0.5d) +
nanosOfDay/NANOS_PER_MILLISECOND;
--- End diff --
1. I would recommend not using Joda. Do the calculations directly, like in
ConvertFromImpalaTimestamp. Joda uses non-standard, hence confusing,
terminology. What Joda calls and uses as JulianDay, is actually Julian Date.
Seems like you have identified this discrepancy and adjusted for it by
subtracting 0.5 from _julianDay_.
Note: (I guess you have already figured this out) : The actual code and
the Joda code in the comment, in ConvertFromImpalaTimestamp, are inconsistent.
Took me a day to figure out the reason behind this ! A bug should be opened to
delete the comment.
2. Can you please also leave a comment stating that 2440588 is the JDN for
the Unix Epoch.
3. Please leave a comment stating that the order of the calls to get
_julianDay_ and _nanosOfDay_ matters. You can do this by just stating how
timestamps are stored in INT96 i.e 32-bit JDN followed by 64-bit nanosOfDay.
4. Consistent(single or none) spacing for binary operators (+-/) used here
would be nice. Single spacing would be preferable.
> Drill and Hive have incompatible timestamp representations in parquet
> ---------------------------------------------------------------------
>
> Key: DRILL-4373
> URL: https://issues.apache.org/jira/browse/DRILL-4373
> Project: Apache Drill
> Issue Type: Improvement
> Components: Storage - Hive, Storage - Parquet
> Affects Versions: 1.8.0
> Reporter: Rahul Challapalli
> Assignee: Karthikeyan Manivannan
> Labels: doc-impacting
> Fix For: 1.9.0
>
>
> git.commit.id.abbrev=83d460c
> I created a parquet file with a timestamp type using Drill. Now if I define a
> hive table on top of the parquet file and use "timestamp" as the column type,
> drill fails to read the hive table through the hive storage plugin
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)