[jira] [Work logged] (HIVE-27199) Read TIMESTAMP WITH LOCAL TIME ZONE columns from text files using custom formats

ASF GitHub Bot (Jira) Mon, 17 Apr 2023 03:41:04 -0700


     [ 
https://issues.apache.org/jira/browse/HIVE-27199?focusedWorklogId=857339&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-857339
 ]


ASF GitHub Bot logged work on HIVE-27199:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 17/Apr/23 10:40
            Start Date: 17/Apr/23 10:40
    Worklog Time Spent: 10m 
      Work Description: TuroczyX commented on code in PR #4170:
URL: https://github.com/apache/hive/pull/4170#discussion_r1168500611


##########
common/src/java/org/apache/hive/common/util/TimestampParser.java:
##########
@@ -199,6 +205,19 @@ public Timestamp parseTimestamp(final String text) {
 
   }
 
+  public TimestampTZ parseTimestamp(String text, ZoneId defaultTimeZone) {
+    Objects.requireNonNull(text);
+    for (DateTimeFormatter f : dtFormatters) {
+      try {
+        return TimestampTZUtil.parse(text, defaultTimeZone, f);
+      } catch (DateTimeException e) {

Review Comment:
   Also, from pattern perspective a TryParse would be more elegant in this 
case. Of course it is just preferences, but I like this pattern. Way more 
descriptive from code reading perspective.
   
https://learn.microsoft.com/en-us/dotnet/api/system.int32.tryparse?view=net-8.0#system-int32-tryparse(system-string-system-int32@)
   
   I know ref and out keyword are not exists in Java but with return type it is 
possible to handle. (Just FYI, no need to change)





Issue Time Tracking
-------------------

    Worklog Id:     (was: 857339)
    Time Spent: 50m  (was: 40m)

> Read TIMESTAMP WITH LOCAL TIME ZONE columns from text files using custom 
> formats
> --------------------------------------------------------------------------------
>
>                 Key: HIVE-27199
>                 URL: https://issues.apache.org/jira/browse/HIVE-27199
>             Project: Hive
>          Issue Type: Improvement
>          Components: Serializers/Deserializers
>    Affects Versions: 4.0.0-alpha-2
>            Reporter: Stamatis Zampetakis
>            Assignee: Stamatis Zampetakis
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 50m
>  Remaining Estimate: 0h
>
> Timestamp values come in many flavors and formats and there is no single 
> representation that can satisfy everyone especially when such values are 
> stored in plain text/csv files.
> HIVE-9298, added a special SERDE property, {{{}timestamp.formats{}}}, that 
> allows to provide custom timestamp patterns to parse correctly TIMESTAMP 
> values coming from files.
> However, when the column type is TIMESTAMP WITH LOCAL TIME ZONE (LTZ) it is 
> not possible to use a custom pattern thus when the built-in Hive parser does 
> not match the expected format a NULL value is returned.
> Consider a text file, F1, with the following values:
> {noformat}
> 2016-05-03 12:26:34
> 2016-05-03T12:26:34
> {noformat}
> and a table with a column declared as LTZ.
> {code:sql}
> CREATE TABLE ts_table (ts TIMESTAMP WITH LOCAL TIME ZONE);
> LOAD DATA LOCAL INPATH './F1' INTO TABLE ts_table;
> SELECT * FROM ts_table;
> 2016-05-03 12:26:34.0 US/Pacific
> NULL
> {code}
> In order to give more flexibility to the users relying on the TIMESTAMP WITH 
> LOCAL TIME ZONE datatype and also align the behavior with the TIMESTAMP type 
> this JIRA aims to reuse the {{timestamp.formats}} property for both TIMESTAMP 
> types.
> The work here focuses exclusively on simple text files but the same could be 
> done for other SERDE such as JSON etc.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-27199) Read TIMESTAMP WITH LOCAL TIME ZONE columns from text files using custom formats

Reply via email to