[ https://issues.apache.org/jira/browse/FLINK-24544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17431204#comment-17431204 ]
Peter Schrott commented on FLINK-24544: --------------------------------------- The underlying problem with deserialization of records with enums form Kafka & schema registry lies in the initialization of `GenericDatumReader`: Case Kafka & SR: In `AvroDeserializationSchema.java` the `GenericDatumReader` is initialized with `writerSchema = null` and `readerSchema = the schma gained from table ddl` -> When calling `RegistryAvroDeserializationSchema.deserialize(.)` `datumReader.setSchema()` sets the attribute `actual` is set to the actual avro schema, whereas `expected` is already set to `readerSchema` -> The inequality of actual` and `expected` causes the exception on serializing as type of `actual` and `expected` do not match --> Root of this is: the initialization of `DeserializationSchema` in `RegistryAvroFormatFactory.java` uses the `rowType` & `ConfluentRegistryAvroDeserializationSchema.forGeneric` when creating the `ConfluentRegistryAvroDeserializationSchema` Case FS: In `AvroInputFormat.java` the `GenericDatumReader` is initialized with `writerSchema = null` and `readerSchema = null` -> This leads in initialization of `DataFileStream`, where `reader.getSchema(.)` is called with the actual avro, to the fact that in the `GenericDatumReader` attribute `expected` and `actual` is set to the passed value -> The avro schema is taken from file -> The equality of `actual` and `expected` leads to the fact that serialized data can be read from file > Failure when using Kafka connector in Table API with Avro and Confluent > schema registry > ---------------------------------------------------------------------------------------- > > Key: FLINK-24544 > URL: https://issues.apache.org/jira/browse/FLINK-24544 > Project: Flink > Issue Type: Bug > Components: Connectors / Kafka, Formats (JSON, Avro, Parquet, ORC, > SequenceFile), Table SQL / Ecosystem > Affects Versions: 1.13.1 > Reporter: Francesco Guardiani > Priority: Major > Attachments: flink-deser-avro-enum.zip > > > A user reported in the [mailing > list|https://lists.apache.org/thread.html/re38a07f6121cc580737a20c11574719cfe554e58d99817f79db9bb4a%40%3Cuser.flink.apache.org%3E] > that Avro deserialization fails when using Kafka, Avro and Confluent Schema > Registry: > {code:java} > Caused by: java.io.IOException: Failed to deserialize Avro record. > at > org.apache.flink.formats.avro.AvroRowDataDeserializationSchema.deserialize(AvroRowDataDeserializationSchema.java:106) > at > org.apache.flink.formats.avro.AvroRowDataDeserializationSchema.deserialize(AvroRowDataDeserializationSchema.java:46) > > at > org.apache.flink.api.common.serialization.DeserializationSchema.deserialize(DeserializationSchema.java:82) > at > org.apache.flink.streaming.connectors.kafka.table.DynamicKafkaDeserializationSchema.deserialize(DynamicKafkaDeserializationSchema.java:113) > at > org.apache.flink.streaming.connectors.kafka.internals.KafkaFetcher.partitionConsumerRecordsHandler(KafkaFetcher.java:179) > > at > org.apache.flink.streaming.connectors.kafka.internals.KafkaFetcher.runFetchLoop(KafkaFetcher.java:142) > at > org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.run(FlinkKafkaConsumerBase.java:826) > at > org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:110) > at > org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:66) > at > org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.run(SourceStreamTask.java:269) > Caused by: org.apache.avro.AvroTypeException: Found my.type.avro.MyEnumType, > expecting union > at org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:308) > at org.apache.avro.io.parsing.Parser.advance(Parser.java:86) > at org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:275) > at > org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:187) > at > org.apache.avro.generic.GenericDatumReader.readArray(GenericDatumReader.java:298) > at > org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:183) > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:160) > at > org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:187) > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:160) > at > org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:259) > at > org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:247) > at > org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:179) > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:160) > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153) > at > org.apache.flink.formats.avro.RegistryAvroDeserializationSchema.deserialize(RegistryAvroDeserializationSchema.java:81) > at > org.apache.flink.formats.avro.AvroRowDataDeserializationSchema.deserialize(AvroRowDataDeserializationSchema.java:103) > ... 9 more > {code} > Look in the attachments for a reproducer. > Same data serialized to a file works fine (look the filesystem example in the > reproducer) -- This message was sent by Atlassian Jira (v8.3.4#803005)