[ 
https://issues.apache.org/jira/browse/DRILL-8100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17483580#comment-17483580
 ] 

Paul Rogers commented on DRILL-8100:
------------------------------------

[~dzamo] , thanks for looking at this. I seem to recall fixing this in that EVF 
V3 branch. For extended types, we're trying to match the [Mongo 
types|https://docs.mongodb.com/manual/reference/mongodb-extended-json/#mongodb-bsontype-Timestamp],
 which are supposed to emit UTC timestamps (in the "relaxed mode.") I suspect 
the parser has to accept both modes, but I can't remember if I included tests 
for both.

The real problem is that, without the conversion, a round trip shifts the 
times. And the test that failed read JSON, wrote Parquet then compared, which 
also shifted times (IIRC.)

This is the issue that made me realize that Drill is schizophrenic: some parts 
of the code convert UTC to local time, other parts do not. When they collide, 
as in the test that failed, bad things happen.

I _think_ you are asking if we should adjust the time for files written by 
versions of Drill before the fix? Sure, but we don't know that: there is no 
metadata in a file that differentiates a correct vs. incorrect timestamp. Any 
suggestions?

> JSON record writer does not convert Drill local timestamp to UTC
> ----------------------------------------------------------------
>
>                 Key: DRILL-8100
>                 URL: https://issues.apache.org/jira/browse/DRILL-8100
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.19.0
>            Reporter: Paul Rogers
>            Assignee: Paul Rogers
>            Priority: Major
>
> Drill follows the old SQL engine convention to store the `TIMESTAMP` type in 
> the local time zone. This is, of course, highly awkward in today's age when 
> UTC is used as the standard timestamp in most products. However, it is how 
> Drill works. (It would be great to add a `UTC_TIMESTAMP` type, but that is 
> another topic.)
> Each reader or writer that works with files that hold UTC timestamps must 
> convert to (reader) or from (writer) Drill's local-time timestamp. Otherwise, 
> Drill works correctly only when the server time zone is set to UTC.
> The JSON writer does not do the proper conversion, causing tests to fail when 
> run in a time zone other than UTC.
> {noformat}
>   @Override
>   public void writeTimestamp(FieldReader reader) throws IOException {
>     if (reader.isSet()) {
>       writeTimestamp(reader.readLocalDateTime());
>     } else {
>       writeTimeNull();
>     }
>   }
> {noformat}
> Basically, it takes a {{LocalDateTime}}, and formats it as a UTC timezone 
> (using the "Z" suffix.) This is only valid if the machine is in the UTC time 
> zone, which is why the test for this class attempts to force the local time 
> zone to UTC, something that must users will not do.
> A consequence of this bug is that "round trip" CTAS will change dates by the 
> UTC offset of the machine running the CTAS. In the Pacific time zone, each 
> "round trip" subtracts 8 hours from the time. After three round trips, the 
> "UTC" date in the Parquet file or JSON will be a day earlier than the 
> original data. One might argue that this "feature" is not always helpful.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to