[
https://issues.apache.org/jira/browse/HUDI-6028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Raymond Xu updated HUDI-6028:
-----------------------------
Sprint: Sprint 2023-04-10
> GCS incr source does not handle pubsub message properly
> -------------------------------------------------------
>
> Key: HUDI-6028
> URL: https://issues.apache.org/jira/browse/HUDI-6028
> Project: Apache Hudi
> Issue Type: Bug
> Components: deltastreamer
> Reporter: Raymond Xu
> Priority: Major
>
> Gcs event source uses schema converter from spark and won't handle field name
> with hyphen in nested column. a sample message
> {code:java}
> 23/04/03 19:23:45 DEBUG GcsEventsSource: msg: {
> "kind": "storage#object",
> "id": "",
> "selfLink": "",
> "name": "",
> "bucket": "",
> "generation": "1680505551370137",
> "metageneration": "1",
> "contentType": "application/octet-stream",
> "timeCreated": "2023-04-03T07:05:51.373Z",
> "updated": "2023-04-03T07:05:51.373Z",
> "storageClass": "STANDARD",
> "timeStorageClassUpdated": "2023-04-03T07:05:51.373Z",
> "size": "6707",
> "md5Hash": "",
> "mediaLink": "",
> "metadata": {
> "goog-reserved-file-mtime": "1680503048"
> },
> "crc32c": "",
> "etag": ""
> }
> {code}
> and it throws
> {code}
> Exception in thread "main" org.apache.avro.SchemaParseException: Illegal
> character in: goog-reserved-file-mtime
> at org.apache.avro.Schema.validateName(Schema.java:1571)
> at org.apache.avro.Schema.access$400(Schema.java:92)
> at org.apache.avro.Schema$Field.<init>(Schema.java:549)
> at
> org.apache.avro.SchemaBuilder$FieldBuilder.completeField(SchemaBuilder.java:2258)
> at
> org.apache.avro.SchemaBuilder$FieldBuilder.completeField(SchemaBuilder.java:2254)
> at
> org.apache.avro.SchemaBuilder$FieldBuilder.access$5100(SchemaBuilder.java:2150)
> at
> org.apache.avro.SchemaBuilder$GenericDefault.noDefault(SchemaBuilder.java:2557)
> at
> org.apache.hudi.org.apache.spark.sql.avro.SchemaConverters$.$anonfun$toAvroType$2(SchemaConverters.scala:205)
> {code}
> This is a problem with org.apache.spark.sql.avro.SchemaConverters#toAvroType
--
This message was sent by Atlassian Jira
(v8.20.10#820010)