[ 
https://issues.apache.org/jira/browse/DRILL-6242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16459369#comment-16459369
 ] 

Jiang Wu commented on DRILL-6242:
---------------------------------

The resulting changes make use of java.time.Local<Date|Time|DateTime> as the 
Java object representation of values from respective Drill vector types: Date, 
Time, Timestamp.  With this change, accessing Drill date time vectors will 
return non time zone specific values.

Below lists out the behavior of Drill with examples to illustrate how the time 
values from various data sources (storage plugin, inline functions) are 
handled.  This represents existing behavior. 

For each data source, the example shows the original value in the data source, 
how such value is interpreted and converted into a value for the Drill 
Timestamp vector, how the value from the Timestamp vector is read.  And how the 
client of Drill can reproduce the original value from the LocalDateTime 
returned from the Drill Timestamp vector.  Any value that is different from the 
original value is highlighted in red.

A Timestamp represents an instant in time and in theory should not be timezone 
dependent.  We can interpret a Timestamp being made of 3 parts: date part, time 
part, and time zone/offset.  Based on the time zone/offset, the date part and 
the time part can be different for the same Timestamp instance.

*1. Date source: extended JSON file, TO_TIMESTAMP(), and CAST to TIMESTAMP.*  
Any time zone associated with the original time value is ignored.  This means 
all timestamps are treated as though they are from the Drill server's local 
time zone.  E.g.

 
{code:java}
select case when t1 = t2 then 1 else 0 end 
from (
  select TO_TIMESTAMP('2015-03-30 20:49:59.0 UTC', 'YYYY-MM-dd HH:mm:ss.s z') 
as t1,
         TO_TIMESTAMP('2015-03-30 20:49:59.0 PST', 'YYYY-MM-dd HH:mm:ss.s z') 
as t2
  from (values(1))
)
{code}
 

returns

 
{code:java}
+---------+
| EXPR$0  |
+---------+
| 1       |
+---------+{code}
 

*2. Date source: parquet timestamp.* Treat date-part and time-part as though 
they are in the Drill server time zone.  Timestamp value is represented as a 
long in Parquet data source.  Produce a \{date, time, UTC} from the Timestamp, 
but then ignore the time zone.  The result is a data part and time part with 
the same values as seen from the UTC time zone.  Example:

 
|| ||Parquet Timestamp value||Write to Drill Timestamp Vector||Read from Drill 
Timestamp Vector||How to get back the original value||
|Actual 
value|123456789012|123456789012|{color:#FF0000}1973-11-29T21:33:09.012{color}
{color:#FF0000}(LocalDateTime){color}|"1973-11-29T21:33:09.012".atZone(OffsetZone.UTC).toInstant()|
|Interpretation in Drill Server Time 
Zone|1973-11-29T21:33:09Z|1973-11-29T21:33:09Z|{color:#FF0000}1973-11-29T21:33:09PST{color}|1973-11-29T21:33:09Z|

 

 

 
*3. Date Source: parquet int96 as timestamp.* Generate the date-part and 
time-part in the Drill server time zone that represent the same instant as the 
timestamp.  Produce a \{date1, time1, UTC} from the Parquet int96 value.  
Convert this to another \{date2, time2, Drill Time Zone} representation.  Strip 
out the Drill Time Zone and replace with UTC resulting in a \{date2, time2, 
UTC} and store in vector.  Example:
 
|| ||Parquet int96 as Timestamp||Write to Drill Timestamp Vector||Read from 
Drill Timestamp Vector||How to get back the original value||
|Actual 
value|1312196153000|{color:#FF0000}1312170953000{color}|{color:#FF0000}2011-08-01T03:55:53{color}
{color:#FF0000}(LocalDateTime){color}|"2011-08-01T03:55:53".atZone(ZoneOffset.systemDefault()).toInstant()|
|Interpretation in Drill Server Time Zone|2011-08-01T10:55:53Z
 
which is the same as:
 
2011-08-01T03:55:53 
PDT|{color:#FF0000}2011-08-01T03:55:53Z{color}|{color:#FF0000}2011-08-01T03:55:53
 PDT{color}|2011-08-01T10:55:53Z|

*4. Date Source: BSON.* Same as Parquet int96 as timestamp type.  Preserve the 
correct time.  Produce a \{date1, time1, UTC} from the Parquet int96 value.  
Convert this to another \{date2, time2, Drill Time Zone} representation.  Strip 
out the Drill Time Zone and replace with UTC resulting in a \{date2, time2, 
UTC} and store in vector.  Example:
|| ||BSON DateTime (long)||Write to Drill Timestamp Vector||Read from Drill 
Timestamp Vector||How to get back the original value||
|Actual 
Value|5262729712|{color:#d04437}5233929712{color}|{color:#d04437}1970-03-02T13:52:09.712{color}
{color:#d04437}(LocalDateTime){color}|"1970-03-02T13:52:09.712".atZone(ZoneOffset.systemDefault()).toInstant()|
|Interpretation in Drill Server Time Zone|1970-03-02T21:52:09Z
 
which is the same as:
 
1970-03-02T13:52:09 
PST|{color:#d04437}1970-03-02T13:52:09.712Z{color}|{color:#d04437}1970-03-02T13:52:09.712
 PST{color}
 
|1970-03-02T21:52:09Z|

 

 

> Output format for nested date, time, timestamp values in an object hierarchy
> ----------------------------------------------------------------------------
>
>                 Key: DRILL-6242
>                 URL: https://issues.apache.org/jira/browse/DRILL-6242
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Data Types
>    Affects Versions: 1.12.0
>            Reporter: Jiang Wu
>            Assignee: Jiang Wu
>            Priority: Major
>             Fix For: 1.14.0
>
>
> Some storages (mapr db, mongo db, etc.) have hierarchical objects that 
> contain nested fields of date, time, timestamp types.  When a query returns 
> these objects, the output format for the nested date, time, timestamp, are 
> showing the internal object (org.joda.time.DateTime), rather than the logical 
> data value.
> For example.  Suppose in MongoDB, we have a single object that looks like 
> this:
> {code:java}
> > db.test.findOne();
> {
>     "_id" : ObjectId("5aa8487d470dd39a635a12f5"),
>     "name" : "orange",
>     "context" : {
>         "date" : ISODate("2018-03-13T21:52:54.940Z"),
>         "user" : "jack"
>     }
> }
> {code}
> Then connect Drill to the above MongoDB storage, and run the following query 
> within Drill:
> {code:java}
> > select t.context.`date`, t.context from test t; 
> +--------+---------+ 
> | EXPR$0 | context | 
> +--------+---------+ 
> | 2018-03-13 | 
> {"date":{"dayOfYear":72,"year":2018,"dayOfMonth":13,"dayOfWeek":2,"era":1,"millisOfDay":78774940,"weekOfWeekyear":11,"weekyear":2018,"monthOfYear":3,"yearOfEra":2018,"yearOfCentury":18,"centuryOfEra":20,"millisOfSecond":940,"secondOfMinute":54,"secondOfDay":78774,"minuteOfHour":52,"minuteOfDay":1312,"hourOfDay":21,"zone":{"fixed":true,"id":"UTC"},"millis":1520977974940,"chronology":{"zone":{"fixed":true,"id":"UTC"}},"afterNow":false,"beforeNow":true,"equalNow":false},"user":"jack"}
>  |
> {code}
> We can see that from the above output, when the date field is retrieved as a 
> top level column, Drill outputs a logical date value.  But when the same 
> field is within an object hierarchy, Drill outputs the internal object used 
> to hold the date value.
> The expected output is the same display for whether the date field is shown 
> as a top level column or when it is within an object hierarchy:
> {code:java}
> > select t.context.`date`, t.context from test t; 
> +--------+---------+ 
> | EXPR$0 | context | 
> +--------+---------+ 
> | 2018-03-13 | {"date":"2018-03-13","user":"jack"} |
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to