[jira] [Commented] (PHOENIX-5066) The TimeZone is incorrectly used during writing or reading data

Istvan Toth (Jira) Thu, 16 Feb 2023 04:21:49 -0800


    [ 
https://issues.apache.org/jira/browse/PHOENIX-5066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17689728#comment-17689728
 ]


Istvan Toth commented on PHOENIX-5066:
--------------------------------------

After thinking about this some more, we really should just handle temporals as 
localDateTime, instead of Instant.

1. This the correct behaviour according to the SQL standard.
2. It's FAR simpler to implement.

The new plan:


 * Internally, all temporals are internally represented as 8 byte HBase dates 
or 8+4 Byte Phoenix Timestamps
 ** All internal representation is stored in GMT, but is interpreted as TZ-less 
dates
 * Parsing dates () from Strings must always be done in GMT, Unless TZ is 
explicitly set as a function parameter.
 ** literals in statements
 ** date parse SQL functions
 ** preparedStatement parameters (when set as strings)
 * Formatting to String is always done with GMT, unless TZ is explicitly set as 
a function parameter
 **  rs.getString()
 ** date format SQL functions
 * All temporal JDBC input parameters are corrected with the client timezone 
offset when converting to the internal representation
 ** this is really just preparedStatment.setDate() and friends
 * When extracting temporals from a resultset, the internal values are 
corrected with the local TZ offset
 ** rs.getDate() and friends
 * All date- >date functions (floor, day, etc) interpret all input as GMT, and 
generate GMT output
 ** No need to push the client TZ to the cluster
 ** No need to change anything
 * Implementing real Date and Time types (i.e. not Timestamps) is out of scope.

Complications:
 * Row timestamps
 ** When set explicitly it is handled exactly the same as a standard date (i.e. 
parsed in GMT, TZ offset applied to epoch)
 ** When set by HBase, it is set to epoch.
 ** This means that setting the field to NOW() explicitly will result in a 
different result than omitting the column.
 ** This could only be avoided if we implemented it as a real TIMESTAMP WITH 
TZ, but that would bring back the whole world of pain of my first approach.
 ** What about SCN values ?
 *** No change, interpret them as unix/hbase timestamps in GMT
 * Server side literal handling
 ** Server JVM also likely to have non-GMT TZ set
 ** See above for cases (defaults, view constants, functional indexes)
 ** The string literals there have already been converted to GMT when parsing 
them on the client side
 ** So we MUST AVOID applying a TZ offset on the server side again
 *** Luckily any literals are already parsed as GMT,  and we don't do any 
offset corrections in the parser for epoch values (but we don't store such in 
the metadata anyway)
 *** So this is a non-issue. Yay!

Implementation:
 * Add offset compensation functions to DateUtil
 * Call above functions in rs.getDate() and friends
 * Call above reverse functions in pstmt.setObject(), pstmt.setDate() and 
friends
 * Thats it! (At least I really hope so)
 * Still need to make the new implementation switchable for backwards 
compatibility

> The TimeZone is incorrectly used during writing or reading data
> ---------------------------------------------------------------
>
>                 Key: PHOENIX-5066
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-5066
>             Project: Phoenix
>          Issue Type: Bug
>    Affects Versions: 5.0.0, 4.14.1
>            Reporter: Jaanai Zhang
>            Assignee: Istvan Toth
>            Priority: Critical
>             Fix For: 5.3.0
>
>         Attachments: DateTest.java, PHOENIX-5066.4x.v1.patch, 
> PHOENIX-5066.4x.v2.patch, PHOENIX-5066.4x.v3.patch, 
> PHOENIX-5066.master.v1.patch, PHOENIX-5066.master.v2.patch, 
> PHOENIX-5066.master.v3.patch, PHOENIX-5066.master.v4.patch, 
> PHOENIX-5066.master.v5.patch, PHOENIX-5066.master.v6.patch
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> We have two methods to write data when uses JDBC API.
> #1. Uses _the exceuteUpdate_ method to execute a string that is an upsert SQL.
> #2. Uses the _prepareStatement_ method to set some objects and execute.
> The _string_ data needs to convert to a new object by the schema information 
> of tables. we'll use some date formatters to convert string data to object 
> for Date/Time/Timestamp types when writes data and the formatters are used 
> when reads data as well.
>  
> *Uses default timezone test*
>  Writing 3 records by the different ways.
> {code:java}
> UPSERT INTO date_test VALUES (1,'2018-12-10 15:40:47','2018-12-10 
> 15:40:47','2018-12-10 15:40:47') 
> UPSERT INTO date_test VALUES (2,to_date('2018-12-10 
> 15:40:47'),to_time('2018-12-10 15:40:47'),to_timestamp('2018-12-10 15:40:47'))
> stmt.setInt(1, 3);stmt.setDate(2, date);stmt.setTime(3, 
> time);stmt.setTimestamp(4, ts);
> {code}
> Reading the table by the getObject(getDate/getTime/getTimestamp) methods.
> {code:java}
> 1 | 2018-12-10 | 23:45:07 | 2018-12-10 23:45:07.0 
> 2 | 2018-12-10 | 23:45:07 | 2018-12-10 23:45:07.0 
> 3 | 2018-12-10 | 15:45:07 | 2018-12-10 15:45:07.66 
> {code}
> Reading the table by the getString methods 
> {code:java}
> 1 | 2018-12-10 15:45:07.000 | 2018-12-10 15:45:07.000 | 2018-12-10 
> 15:45:07.000 
> 2 | 2018-12-10 15:45:07.000 | 2018-12-10 15:45:07.000 | 2018-12-10 
> 15:45:07.000 
> 3 | 2018-12-10 07:45:07.660 | 2018-12-10 07:45:07.660 | 2018-12-10 
> 07:45:07.660
> {code}
>  *Uses GMT+8 test*
>  Writing 3 records by the different ways.
> {code:java}
> UPSERT INTO date_test VALUES (1,'2018-12-10 15:40:47','2018-12-10 
> 15:40:47','2018-12-10 15:40:47')
> UPSERT INTO date_test VALUES (2,to_date('2018-12-10 
> 15:40:47'),to_time('2018-12-10 15:40:47'),to_timestamp('2018-12-10 15:40:47'))
> stmt.setInt(1, 3);stmt.setDate(2, date);stmt.setTime(3, 
> time);stmt.setTimestamp(4, ts);
> {code}
> Reading the table by the getObject(getDate/getTime/getTimestamp) methods.
> {code:java}
> 1 | 2018-12-10 | 23:40:47 | 2018-12-10 23:40:47.0 
> 2 | 2018-12-10 | 15:40:47 | 2018-12-10 15:40:47.0 
> 3 | 2018-12-10 | 15:40:47 | 2018-12-10 15:40:47.106 {code}
> Reading the table by the getString methods
> {code:java}
>  1 | 2018-12-10 23:40:47.000 | 2018-12-10 23:40:47.000 | 2018-12-10 
> 23:40:47.000
> 2 | 2018-12-10 15:40:47.000 | 2018-12-10 15:40:47.000 | 2018-12-10 
> 15:40:47.000
> 3 | 2018-12-10 15:40:47.106 | 2018-12-10 15:40:47.106 | 2018-12-10 
> 15:40:47.106
> {code}
>  
> _We_ have a historical problem,  we'll parse the string to 
> Date/Time/Timestamp objects with timezone in #1, which means the actual data 
> is going to be changed when stored in HBase table。



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (PHOENIX-5066) The TimeZone is incorrectly used during writing or reading data

Reply via email to