[
https://issues.apache.org/jira/browse/HUDI-7985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ethan Guo updated HUDI-7985:
----------------------------
Description:
Following error is thrown when using Json Kafka Source with transformer and
decimal is in the schema:
{code:java}
Caused by: Json to Avro Type conversion error for field loaded_at, 2024-06-03
13:42:34.951+00:00 for {"type":"long","logicalType":"timestamp-millis"}
at
org.apache.hudi.avro.MercifulJsonConverter$JsonToAvroFieldProcessorUtil$JsonToAvroFieldProcessor.convertToAvro(MercifulJsonConverter.java:194)
at
org.apache.hudi.avro.MercifulJsonConverter$JsonToAvroFieldProcessorUtil.convertToAvro(MercifulJsonConverter.java:204)
at
org.apache.hudi.avro.MercifulJsonConverter.convertJsonToAvroField(MercifulJsonConverter.java:182)
at
org.apache.hudi.avro.MercifulJsonConverter.convertJsonToAvro(MercifulJsonConverter.java:126)
at
org.apache.hudi.avro.MercifulJsonConverter.convert(MercifulJsonConverter.java:107)
at
org.apache.hudi.utilities.sources.helpers.AvroConvertor.fromJson(AvroConvertor.java:118)
... 43 more {code}
We need to make sure "2024-06-03 13:42:34.951+00:00" is supported in timestamp
logical type.
* ISO 8601 supports the zone offset in the standard, e.g., {{+01:00}} , and
{{Z}} is the zone offset equivalent to {{+00:00}} or UTC
([ref1|https://en.wikipedia.org/wiki/ISO_8601#Time_zone_designators])
* {{2011-12-03T10:15:30+01:00}} conforms to ISO 8601 with {{T}} as the
separation character
* There are systems that use \{{ }} (space) instead of {{T}} as the separation
(other parts are the same). References indicate that ISO-8601 used to allow
this by _mutual agreement_
([ref2|https://stackoverflow.com/questions/30201003/how-to-deal-with-optional-t-in-iso-8601-timestamp-in-java-8-jsr-310-threet],
[ref3|https://www.reddit.com/r/ISO8601/comments/173r61j/t_vs_space_separation_of_date_and_time/])
* {{DateTimeFormatter.ISO_OFFSET_DATE_TIME}} can successfully parse timestamps
like {{2024-05-13T23:53:36.004Z}} , already supported in
{{{}MercifulJsonConverter{}}}, and additionally {{2011-12-03T10:15:30+01:00}}
with zone offset (which is not supported in {{MercifulJsonConverter}} yet)
* {{DateTimeFormatter.ISO_OFFSET_DATE_TIME}} cannot parse the timestamp with
space as the separator, like {{2011-12-03 10:15:30+01:00}} . But with a simple
twist of the formatter, it can be easily supported.
My take is we should change the formatter of the timestamp logical types to
support zone offset and space character as the separator (which is backwards
compatible), instead of introducing a new config of format (assuming that
common use cases just have space character as the variant).
was:
Following error is thrown when using Json Kafka Source with transformer and
decimal is in the schema:
We need to make sure "2024-06-03 13:42:34.951+00:00" is supported in timestamp
logical type.
* ISO 8601 supports the zone offset in the standard, e.g., {{+01:00}} , and
{{Z}} is the zone offset equivalent to {{+00:00}} or UTC
([ref1|https://en.wikipedia.org/wiki/ISO_8601#Time_zone_designators])
* {{2011-12-03T10:15:30+01:00}} conforms to ISO 8601 with {{T}} as the
separation character
* There are systems that use \{{ }} (space) instead of {{T}} as the separation
(other parts are the same). References indicate that ISO-8601 used to allow
this by _mutual agreement_
([ref2|https://stackoverflow.com/questions/30201003/how-to-deal-with-optional-t-in-iso-8601-timestamp-in-java-8-jsr-310-threet],
[ref3|https://www.reddit.com/r/ISO8601/comments/173r61j/t_vs_space_separation_of_date_and_time/])
* {{DateTimeFormatter.ISO_OFFSET_DATE_TIME}} can successfully parse timestamps
like {{2024-05-13T23:53:36.004Z}} , already supported in
{{{}MercifulJsonConverter{}}}, and additionally {{2011-12-03T10:15:30+01:00}}
with zone offset (which is not supported in {{MercifulJsonConverter}} yet)
* {{DateTimeFormatter.ISO_OFFSET_DATE_TIME}} cannot parse the timestamp with
space as the separator, like {{2011-12-03 10:15:30+01:00}} . But with a simple
twist of the formatter, it can be easily supported.
My take is we should change the formatter of the timestamp logical types to
support zone offset and space character as the separator (which is backwards
compatible), instead of introducing a new config of format (assuming that
common use cases just have space character as the variant).
> Support more formats in timestamp logical types in Json Avro converter
> ----------------------------------------------------------------------
>
> Key: HUDI-7985
> URL: https://issues.apache.org/jira/browse/HUDI-7985
> Project: Apache Hudi
> Issue Type: Improvement
> Reporter: Ethan Guo
> Assignee: Ethan Guo
> Priority: Major
> Labels: pull-request-available
> Fix For: 1.0.0
>
>
> Following error is thrown when using Json Kafka Source with transformer and
> decimal is in the schema:
> {code:java}
> Caused by: Json to Avro Type conversion error for field loaded_at, 2024-06-03
> 13:42:34.951+00:00 for {"type":"long","logicalType":"timestamp-millis"}
> at
> org.apache.hudi.avro.MercifulJsonConverter$JsonToAvroFieldProcessorUtil$JsonToAvroFieldProcessor.convertToAvro(MercifulJsonConverter.java:194)
> at
> org.apache.hudi.avro.MercifulJsonConverter$JsonToAvroFieldProcessorUtil.convertToAvro(MercifulJsonConverter.java:204)
> at
> org.apache.hudi.avro.MercifulJsonConverter.convertJsonToAvroField(MercifulJsonConverter.java:182)
> at
> org.apache.hudi.avro.MercifulJsonConverter.convertJsonToAvro(MercifulJsonConverter.java:126)
> at
> org.apache.hudi.avro.MercifulJsonConverter.convert(MercifulJsonConverter.java:107)
> at
> org.apache.hudi.utilities.sources.helpers.AvroConvertor.fromJson(AvroConvertor.java:118)
> ... 43 more {code}
> We need to make sure "2024-06-03 13:42:34.951+00:00" is supported in
> timestamp logical type.
> * ISO 8601 supports the zone offset in the standard, e.g., {{+01:00}} , and
> {{Z}} is the zone offset equivalent to {{+00:00}} or UTC
> ([ref1|https://en.wikipedia.org/wiki/ISO_8601#Time_zone_designators])
> * {{2011-12-03T10:15:30+01:00}} conforms to ISO 8601 with {{T}} as the
> separation character
> * There are systems that use \{{ }} (space) instead of {{T}} as the
> separation (other parts are the same). References indicate that ISO-8601
> used to allow this by _mutual agreement_
> ([ref2|https://stackoverflow.com/questions/30201003/how-to-deal-with-optional-t-in-iso-8601-timestamp-in-java-8-jsr-310-threet],
>
> [ref3|https://www.reddit.com/r/ISO8601/comments/173r61j/t_vs_space_separation_of_date_and_time/])
> * {{DateTimeFormatter.ISO_OFFSET_DATE_TIME}} can successfully parse
> timestamps like {{2024-05-13T23:53:36.004Z}} , already supported in
> {{{}MercifulJsonConverter{}}}, and additionally {{2011-12-03T10:15:30+01:00}}
> with zone offset (which is not supported in {{MercifulJsonConverter}} yet)
> * {{DateTimeFormatter.ISO_OFFSET_DATE_TIME}} cannot parse the timestamp with
> space as the separator, like {{2011-12-03 10:15:30+01:00}} . But with a
> simple twist of the formatter, it can be easily supported.
> My take is we should change the formatter of the timestamp logical types to
> support zone offset and space character as the separator (which is backwards
> compatible), instead of introducing a new config of format (assuming that
> common use cases just have space character as the variant).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)