[
https://issues.apache.org/jira/browse/HUDI-7985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ethan Guo updated HUDI-7985:
----------------------------
Story Points: 1
> Support more formats in timestamp logical types in Json Avro converter
> ----------------------------------------------------------------------
>
> Key: HUDI-7985
> URL: https://issues.apache.org/jira/browse/HUDI-7985
> Project: Apache Hudi
> Issue Type: Improvement
> Reporter: Ethan Guo
> Assignee: Ethan Guo
> Priority: Blocker
> Labels: pull-request-available
> Fix For: 1.0.0
>
>
> Following error is thrown when using Json Kafka Source with transformer and
> decimal is in the schema:
> {code:java}
> Caused by: Json to Avro Type conversion error for field loaded_at, 2024-06-03
> 13:42:34.951+00:00 for {"type":"long","logicalType":"timestamp-millis"}
> at
> org.apache.hudi.avro.MercifulJsonConverter$JsonToAvroFieldProcessorUtil$JsonToAvroFieldProcessor.convertToAvro(MercifulJsonConverter.java:194)
> at
> org.apache.hudi.avro.MercifulJsonConverter$JsonToAvroFieldProcessorUtil.convertToAvro(MercifulJsonConverter.java:204)
> at
> org.apache.hudi.avro.MercifulJsonConverter.convertJsonToAvroField(MercifulJsonConverter.java:182)
> at
> org.apache.hudi.avro.MercifulJsonConverter.convertJsonToAvro(MercifulJsonConverter.java:126)
> at
> org.apache.hudi.avro.MercifulJsonConverter.convert(MercifulJsonConverter.java:107)
> at
> org.apache.hudi.utilities.sources.helpers.AvroConvertor.fromJson(AvroConvertor.java:118)
> ... 43 more {code}
> We need to make sure "2024-06-03 13:42:34.951+00:00" is supported in
> timestamp logical type.
> * ISO 8601 supports the zone offset in the standard, e.g., {{+01:00}} , and
> {{Z}} is the zone offset equivalent to {{+00:00}} or UTC
> ([ref1|https://en.wikipedia.org/wiki/ISO_8601#Time_zone_designators])
> * {{2011-12-03T10:15:30+01:00}} conforms to ISO 8601 with {{T}} as the
> separation character
> * There are systems that use \{{ }} (space) instead of {{T}} as the
> separation (other parts are the same). References indicate that ISO-8601
> used to allow this by _mutual agreement_
> ([ref2|https://stackoverflow.com/questions/30201003/how-to-deal-with-optional-t-in-iso-8601-timestamp-in-java-8-jsr-310-threet],
>
> [ref3|https://www.reddit.com/r/ISO8601/comments/173r61j/t_vs_space_separation_of_date_and_time/])
> * {{DateTimeFormatter.ISO_OFFSET_DATE_TIME}} can successfully parse
> timestamps like {{2024-05-13T23:53:36.004Z}} , already supported in
> {{{}MercifulJsonConverter{}}}, and additionally {{2011-12-03T10:15:30+01:00}}
> with zone offset (which is not supported in {{MercifulJsonConverter}} yet)
> * {{DateTimeFormatter.ISO_OFFSET_DATE_TIME}} cannot parse the timestamp with
> space as the separator, like {{2011-12-03 10:15:30+01:00}} . But with a
> simple twist of the formatter, it can be easily supported.
> My take is we should change the formatter of the timestamp logical types to
> support zone offset and space character as the separator (which is backwards
> compatible), instead of introducing a new config of format (assuming that
> common use cases just have space character as the variant).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)