[ 
https://issues.apache.org/jira/browse/HUDI-7985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7985:
----------------------------
    Story Points: 1

> Support more formats in timestamp logical types in Json Avro converter
> ----------------------------------------------------------------------
>
>                 Key: HUDI-7985
>                 URL: https://issues.apache.org/jira/browse/HUDI-7985
>             Project: Apache Hudi
>          Issue Type: Improvement
>            Reporter: Ethan Guo
>            Assignee: Ethan Guo
>            Priority: Blocker
>              Labels: pull-request-available
>             Fix For: 1.0.0
>
>
> Following error is thrown when using Json Kafka Source with transformer and 
> decimal is in the schema:
> {code:java}
> Caused by: Json to Avro Type conversion error for field loaded_at, 2024-06-03 
> 13:42:34.951+00:00 for {"type":"long","logicalType":"timestamp-millis"}
>       at 
> org.apache.hudi.avro.MercifulJsonConverter$JsonToAvroFieldProcessorUtil$JsonToAvroFieldProcessor.convertToAvro(MercifulJsonConverter.java:194)
>       at 
> org.apache.hudi.avro.MercifulJsonConverter$JsonToAvroFieldProcessorUtil.convertToAvro(MercifulJsonConverter.java:204)
>       at 
> org.apache.hudi.avro.MercifulJsonConverter.convertJsonToAvroField(MercifulJsonConverter.java:182)
>       at 
> org.apache.hudi.avro.MercifulJsonConverter.convertJsonToAvro(MercifulJsonConverter.java:126)
>       at 
> org.apache.hudi.avro.MercifulJsonConverter.convert(MercifulJsonConverter.java:107)
>       at 
> org.apache.hudi.utilities.sources.helpers.AvroConvertor.fromJson(AvroConvertor.java:118)
>       ... 43 more {code}
> We need to make sure "2024-06-03 13:42:34.951+00:00" is supported in 
> timestamp logical type.
>  * ISO 8601 supports the zone offset in the standard, e.g., {{+01:00}} , and 
> {{Z}} is the zone offset equivalent to {{+00:00}} or UTC 
> ([ref1|https://en.wikipedia.org/wiki/ISO_8601#Time_zone_designators])
>  * {{2011-12-03T10:15:30+01:00}} conforms to ISO 8601 with {{T}} as the 
> separation character
>  * There are systems that use \{{ }} (space) instead of {{T}} as the 
> separation (other parts are the same).  References indicate that ISO-8601 
> used to allow this by _mutual agreement_ 
> ([ref2|https://stackoverflow.com/questions/30201003/how-to-deal-with-optional-t-in-iso-8601-timestamp-in-java-8-jsr-310-threet],
>  
> [ref3|https://www.reddit.com/r/ISO8601/comments/173r61j/t_vs_space_separation_of_date_and_time/])
>  * {{DateTimeFormatter.ISO_OFFSET_DATE_TIME}} can successfully parse 
> timestamps like {{2024-05-13T23:53:36.004Z}} , already supported in 
> {{{}MercifulJsonConverter{}}}, and additionally {{2011-12-03T10:15:30+01:00}} 
> with zone offset (which is not supported in {{MercifulJsonConverter}} yet)
>  * {{DateTimeFormatter.ISO_OFFSET_DATE_TIME}} cannot parse the timestamp with 
> space as the separator, like {{2011-12-03 10:15:30+01:00}} .  But with a 
> simple twist of the formatter, it can be easily supported.
> My take is we should change the formatter of the timestamp logical types to 
> support zone offset and space character as the separator (which is backwards 
> compatible), instead of introducing a new config of format (assuming that 
> common use cases just have space character as the variant). 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to