[ https://issues.apache.org/jira/browse/DRILL-5360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16728455#comment-16728455 ]
Vitalii Diravka commented on DRILL-5360: ---------------------------------------- [~priteshm] Drill has a lot of issues and inconsistencies connected to the "time/date" datatypes and Time Zones. It is necessary to lead it to the general form. Things to be done: * Design doc for Drill Date, Time, Timestamp; * Revisiting and leading to the SQL standard of Drill "date/time" datatypes. Reading, converting, storing internally (ValueVectors), extracting and representing of "date/time" datatypes: ** Consistency between different data sources (parquet, csv ...) ** Date, time should not rely in particular time zone ** Issues connected to representing of "date/time" data types for JDBC (server and client are in different time zones). ** Issues connected to CAST operator ** Improvement of "date/time" output (style) ** Considering of new data type - TimestampTZ ([PostgreSQL|https://www.postgresql.org/docs/9.1/static/datatype-datetime.html] has equivalent TIMESTAMP WITH TIME ZONE and Calcite has - TIMESTAMP_WITH_LOCAL_TIME_ZONE); * Moving from [Joda Time|http://www.joda.org/joda-time/] to [Java SE 8 Date and Time|http://www.oracle.com/technetwork/articles/java/jf14-date-time-2125367.html] These changes are not backward-compatible. Therefore they should be implemented along with design document for the next Drill major version. It seems the best one is 2.0.0 version. > Timestamp type documented as UTC, implemented as local time > ----------------------------------------------------------- > > Key: DRILL-5360 > URL: https://issues.apache.org/jira/browse/DRILL-5360 > Project: Apache Drill > Issue Type: Bug > Affects Versions: 1.10.0 > Reporter: Paul Rogers > Assignee: Bohdan Kazydub > Priority: Critical > Fix For: 2.0.0 > > > The Drill documentation implies that the {{Timestamp}} type is in UTC: > bq. JDBC timestamp in year, month, date hour, minute, second, and optional > milliseconds format: yyyy-MM-dd HH:mm:ss.SSS. ... TIMESTAMP literals: Drill > stores values in Coordinated Universal Time (UTC). Drill supports time > functions in the range 1971 to 2037. ... Drill does not support TIMESTAMP > with time zone. > The above is ambiguous. The first part talks about JDBC timestamps. From the > JDK Javadoc: > bq. Timestamp: A thin wrapper around java.util.Date. ... Date class is > intended to reflect coordinated universal time (UTC)... > So, a JDBC timestamp is intended to represent time in UTC. (The "indented to > reflect" statement leaves open the possibility of misusing {{Date}} to > represent times in other time zones. This was common practice in early Java > development and was the reason for the eventual development of the Joda, then > Java 8 date/time classes.) > The Drill documentation implies that timestamp *literals* are in UTC, but a > careful read of the documentation does allow an interpretation that the > internal representation can be other than UTC. If this is true, then we would > also rely on a liberal reading of the Java `Timestamp` class to also not be > UTC. (Or, we rely on the Drill JDBC driver to convert from the (unknown) > server time zone to a UTC value returned by the Drill JDBC client.) > Still, a superficial reading (and common practice) would suggest that a Drill > Timestamp should be in UTC. > However, a test on a Mac, with an embedded Drillbit (run in the Pacific time > zone, with Daylight Savings Time in effect) shows that the Timestamp binary > value is actual local time: > {code} > long before = System.currentTimeMillis(); > long value = getDateValue(client, "SELECT NOW() FROM (VALUES(1))" ); > double hrsDiff = (value - before) / (1000.00 * 60 * 60); > System.out.println("Hours: " + hrsDiff); > {code} > The above gets the actual UTC time from Java. Then, it runs a query that gets > Drill's idea of the current time using the {{NOW()}} function. (The > {{getDateValue}} function uses the new test framework to access the actual > {{long}} value from the returned value vector.) Finally, we compute the > difference between the two times, converted to hours. Output: > {code} > Hours: -6.9999975 > {code} > As it turns out, this is the difference between UTC and PDT. So, the time is > in local time, not UTC. > Since the documentation and implementation are both ambiguous, it is hard to > know the intent of the Drill Timestamp. Clearly, common practice is to use > UTC. But, there is wiggle-room. > If the Timestamp value is supposed to be local time, then Drill should > provide a function to return the server's time zone offset (in ms) from UTC > so that the client can to the needed local-to-UTC conversion to get a true > timestamp. > On the other hand, if the Timestamp is supposed to be UTC (per common > practice), then {{NOW()}} should not report local time, it should return UTC. > Further, if {{NOW()}} returns local time, but Timestamp literals are UTC, > then it is hard to see how any query can be rationally written if one > timestamp value is local, but a literal is UTC. > So, job #1 is to define the Timestamp semantics. Then, use that to figure out > where the bug lies to make implementation consistent with documentation (or > visa-versa.) -- This message was sent by Atlassian JIRA (v7.6.3#76005)