[ 
https://issues.apache.org/jira/browse/HUDI-7985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7985:
----------------------------
    Description: 
Following error is thrown when using Json Kafka Source with transformer and 
decimal is in the schema:
{code:java}
Caused by: Json to Avro Type conversion error for field loaded_at, 2024-06-03 
13:42:34.951+00:00 for {"type":"long","logicalType":"timestamp-millis"}
        at 
org.apache.hudi.avro.MercifulJsonConverter$JsonToAvroFieldProcessorUtil$JsonToAvroFieldProcessor.convertToAvro(MercifulJsonConverter.java:194)
        at 
org.apache.hudi.avro.MercifulJsonConverter$JsonToAvroFieldProcessorUtil.convertToAvro(MercifulJsonConverter.java:204)
        at 
org.apache.hudi.avro.MercifulJsonConverter.convertJsonToAvroField(MercifulJsonConverter.java:182)
        at 
org.apache.hudi.avro.MercifulJsonConverter.convertJsonToAvro(MercifulJsonConverter.java:126)
        at 
org.apache.hudi.avro.MercifulJsonConverter.convert(MercifulJsonConverter.java:107)
        at 
org.apache.hudi.utilities.sources.helpers.AvroConvertor.fromJson(AvroConvertor.java:118)
        ... 43 more {code}
We need to make sure "2024-06-03 13:42:34.951+00:00" is supported in timestamp 
logical type.
 * ISO 8601 supports the zone offset in the standard, e.g., {{+01:00}} , and 
{{Z}} is the zone offset equivalent to {{+00:00}} or UTC 
([ref1|https://en.wikipedia.org/wiki/ISO_8601#Time_zone_designators])
 * {{2011-12-03T10:15:30+01:00}} conforms to ISO 8601 with {{T}} as the 
separation character
 * There are systems that use \{{ }} (space) instead of {{T}} as the separation 
(other parts are the same).  References indicate that ISO-8601 used to allow 
this by _mutual agreement_ 
([ref2|https://stackoverflow.com/questions/30201003/how-to-deal-with-optional-t-in-iso-8601-timestamp-in-java-8-jsr-310-threet],
 
[ref3|https://www.reddit.com/r/ISO8601/comments/173r61j/t_vs_space_separation_of_date_and_time/])
 * {{DateTimeFormatter.ISO_OFFSET_DATE_TIME}} can successfully parse timestamps 
like {{2024-05-13T23:53:36.004Z}} , already supported in 
{{{}MercifulJsonConverter{}}}, and additionally {{2011-12-03T10:15:30+01:00}} 
with zone offset (which is not supported in {{MercifulJsonConverter}} yet)
 * {{DateTimeFormatter.ISO_OFFSET_DATE_TIME}} cannot parse the timestamp with 
space as the separator, like {{2011-12-03 10:15:30+01:00}} .  But with a simple 
twist of the formatter, it can be easily supported.

My take is we should change the formatter of the timestamp logical types to 
support zone offset and space character as the separator (which is backwards 
compatible), instead of introducing a new config of format (assuming that 
common use cases just have space character as the variant). 

  was:
Following error is thrown when using Json Kafka Source with transformer and 
decimal is in the schema:



 

 

We need to make sure "2024-06-03 13:42:34.951+00:00" is supported in timestamp 
logical type.
 * ISO 8601 supports the zone offset in the standard, e.g., {{+01:00}} , and 
{{Z}} is the zone offset equivalent to {{+00:00}} or UTC 
([ref1|https://en.wikipedia.org/wiki/ISO_8601#Time_zone_designators])
 * {{2011-12-03T10:15:30+01:00}} conforms to ISO 8601 with {{T}} as the 
separation character
 * There are systems that use \{{ }} (space) instead of {{T}} as the separation 
(other parts are the same).  References indicate that ISO-8601 used to allow 
this by _mutual agreement_ 
([ref2|https://stackoverflow.com/questions/30201003/how-to-deal-with-optional-t-in-iso-8601-timestamp-in-java-8-jsr-310-threet],
 
[ref3|https://www.reddit.com/r/ISO8601/comments/173r61j/t_vs_space_separation_of_date_and_time/])
 * {{DateTimeFormatter.ISO_OFFSET_DATE_TIME}} can successfully parse timestamps 
like {{2024-05-13T23:53:36.004Z}} , already supported in 
{{{}MercifulJsonConverter{}}}, and additionally {{2011-12-03T10:15:30+01:00}} 
with zone offset (which is not supported in {{MercifulJsonConverter}} yet)
 * {{DateTimeFormatter.ISO_OFFSET_DATE_TIME}} cannot parse the timestamp with 
space as the separator, like {{2011-12-03 10:15:30+01:00}} .  But with a simple 
twist of the formatter, it can be easily supported.

My take is we should change the formatter of the timestamp logical types to 
support zone offset and space character as the separator (which is backwards 
compatible), instead of introducing a new config of format (assuming that 
common use cases just have space character as the variant). 


> Support more formats in timestamp logical types in Json Avro converter
> ----------------------------------------------------------------------
>
>                 Key: HUDI-7985
>                 URL: https://issues.apache.org/jira/browse/HUDI-7985
>             Project: Apache Hudi
>          Issue Type: Improvement
>            Reporter: Ethan Guo
>            Assignee: Ethan Guo
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.0.0
>
>
> Following error is thrown when using Json Kafka Source with transformer and 
> decimal is in the schema:
> {code:java}
> Caused by: Json to Avro Type conversion error for field loaded_at, 2024-06-03 
> 13:42:34.951+00:00 for {"type":"long","logicalType":"timestamp-millis"}
>       at 
> org.apache.hudi.avro.MercifulJsonConverter$JsonToAvroFieldProcessorUtil$JsonToAvroFieldProcessor.convertToAvro(MercifulJsonConverter.java:194)
>       at 
> org.apache.hudi.avro.MercifulJsonConverter$JsonToAvroFieldProcessorUtil.convertToAvro(MercifulJsonConverter.java:204)
>       at 
> org.apache.hudi.avro.MercifulJsonConverter.convertJsonToAvroField(MercifulJsonConverter.java:182)
>       at 
> org.apache.hudi.avro.MercifulJsonConverter.convertJsonToAvro(MercifulJsonConverter.java:126)
>       at 
> org.apache.hudi.avro.MercifulJsonConverter.convert(MercifulJsonConverter.java:107)
>       at 
> org.apache.hudi.utilities.sources.helpers.AvroConvertor.fromJson(AvroConvertor.java:118)
>       ... 43 more {code}
> We need to make sure "2024-06-03 13:42:34.951+00:00" is supported in 
> timestamp logical type.
>  * ISO 8601 supports the zone offset in the standard, e.g., {{+01:00}} , and 
> {{Z}} is the zone offset equivalent to {{+00:00}} or UTC 
> ([ref1|https://en.wikipedia.org/wiki/ISO_8601#Time_zone_designators])
>  * {{2011-12-03T10:15:30+01:00}} conforms to ISO 8601 with {{T}} as the 
> separation character
>  * There are systems that use \{{ }} (space) instead of {{T}} as the 
> separation (other parts are the same).  References indicate that ISO-8601 
> used to allow this by _mutual agreement_ 
> ([ref2|https://stackoverflow.com/questions/30201003/how-to-deal-with-optional-t-in-iso-8601-timestamp-in-java-8-jsr-310-threet],
>  
> [ref3|https://www.reddit.com/r/ISO8601/comments/173r61j/t_vs_space_separation_of_date_and_time/])
>  * {{DateTimeFormatter.ISO_OFFSET_DATE_TIME}} can successfully parse 
> timestamps like {{2024-05-13T23:53:36.004Z}} , already supported in 
> {{{}MercifulJsonConverter{}}}, and additionally {{2011-12-03T10:15:30+01:00}} 
> with zone offset (which is not supported in {{MercifulJsonConverter}} yet)
>  * {{DateTimeFormatter.ISO_OFFSET_DATE_TIME}} cannot parse the timestamp with 
> space as the separator, like {{2011-12-03 10:15:30+01:00}} .  But with a 
> simple twist of the formatter, it can be easily supported.
> My take is we should change the formatter of the timestamp logical types to 
> support zone offset and space character as the separator (which is backwards 
> compatible), instead of introducing a new config of format (assuming that 
> common use cases just have space character as the variant). 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to