Paul Rogers created DRILL-5360:
----------------------------------
Summary: Timestamp type documented as UTC, implemented as local
time
Key: DRILL-5360
URL: https://issues.apache.org/jira/browse/DRILL-5360
Project: Apache Drill
Issue Type: Bug
Affects Versions: 1.10.0
Reporter: Paul Rogers
The Drill documentation implies that the {{Timestamp}} type is in UTC:
bq. JDBC timestamp in year, month, date hour, minute, second, and optional
milliseconds format: yyyy-MM-dd HH:mm:ss.SSS. ... TIMESTAMP literals: Drill
stores values in Coordinated Universal Time (UTC). Drill supports time
functions in the range 1971 to 2037. ... Drill does not support TIMESTAMP with
time zone.
The above is ambiguous. The first part talks about JDBC timestamps. From the
JDK Javadoc:
bq. Timestamp: A thin wrapper around java.util.Date. ... Date class is intended
to reflect coordinated universal time (UTC)...
So, a JDBC timestamp is intended to represent time in UTC. (The "indented to
reflect" statement leaves open the possibility of misusing {{Date}} to
represent times in other time zones. This was common practice in early Java
development and was the reason for the eventual development of the Joda, then
Java 8 date/time classes.)
The Drill documentation implies that timestamp *literals* are in UTC, but a
careful read of the documentation does allow an interpretation that the
internal representation can be other than UTC. If this is true, then we would
also rely on a liberal reading of the Java `Timestamp` class to also not be
UTC. (Or, we rely on the Drill JDBC driver to convert from the (unknown) server
time zone to a UTC value returned by the Drill JDBC client.)
Still, a superficial reading (and common practice) would suggest that a Drill
Timestamp should be in UTC.
However, a test on a Mac, with an embedded Drillbit (run in the Pacific time
zone, with Daylight Savings Time in effect) shows that the Timestamp binary
value is actual local time:
{code}
long before = System.currentTimeMillis();
long value = getDateValue(client, "SELECT NOW() FROM (VALUES(1))" );
double hrsDiff = (value - before) / (1000.00 * 60 * 60);
System.out.println("Hours: " + hrsDiff);
{code}
The above gets the actual UTC time from Java. Then, it runs a query that gets
Drill's idea of the current time using the {{NOW()}} function. (The
{{getDateValue}} function uses the new test framework to access the actual
{{long}} value from the returned value vector.) Finally, we compute the
difference between the two times, converted to hours. Output:
{code}
Hours: -6.9999975
{code}
As it turns out, this is the difference between UTC and PDT. So, the time is in
local time, not UTC.
Since the documentation and implementation are both ambiguous, it is hard to
know the intent of the Drill Timestamp. Clearly, common practice is to use UTC.
But, there is wiggle-room.
If the Timestamp value is supposed to be local time, then Drill should provide
a function to return the server's time zone offset (in ms) from UTC so that the
client can to the needed local-to-UTC conversion to get a true timestamp.
On the other hand, if the Timestamp is supposed to be UTC (per common
practice), then {{NOW()}} should not report local time, it should return UTC.
Further, if {{NOW()}} returns local time, but Timestamp literals are UTC, then
it is hard to see how any query can be rationally written if one timestamp
value is local, but a literal is UTC.
So, job #1 is to define the Timestamp semantics. Then, use that to figure out
where the bug lies to make implementation consistent with documentation (or
visa-versa.)
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)