[jira] [Commented] (FLINK-16048) Support read/write confluent schema registry avro data from Kafka

Dawid Wysakowicz (Jira) Wed, 22 Jul 2020 00:33:14 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-16048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17162560#comment-17162560
 ]


Dawid Wysakowicz commented on FLINK-16048:
------------------------------------------

[~jark] Cloudera schema registry has a different binary format. Nertheless even 
if it had the same format I still think we should go with {{avro-confluent}}. 
You still would have to use different factories for constructing the two 
{{DeserializationSchemas}}, as they will be placed in different modules. That 
in turn means you need two different identifiers.

[~danny0405] I see no reason why we would not want to support cloudera's if 
somebody would like to contribute it. Personally I don't get the argument that 
the name is too verbose. We are talking about 7 additional characters which you 
type once and usually you copy paste it. This might be my personal taste 
though. I'll stick to the majority decision on that one, but my preference is 
{{avro-confluent}}. [~sjwiesman] [~jark] What is your preference, as you are 
also involved in this one. I understand [~danny0405] prefers {{avro-sr}}.

> Support read/write confluent schema registry avro data  from Kafka
> ------------------------------------------------------------------
>
>                 Key: FLINK-16048
>                 URL: https://issues.apache.org/jira/browse/FLINK-16048
>             Project: Flink
>          Issue Type: Improvement
>          Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile), Table 
> SQL / Ecosystem
>    Affects Versions: 1.11.0
>            Reporter: Leonard Xu
>            Assignee: Danny Chen
>            Priority: Major
>              Labels: pull-request-available, usability
>             Fix For: 1.12.0
>
>
> *The background*
> I found SQL Kafka connector can not consume avro data that was serialized by 
> `KafkaAvroSerializer` and only can consume Row data with avro schema because 
> we use `AvroRowDeserializationSchema/AvroRowSerializationSchema` to se/de 
> data in  `AvroRowFormatFactory`. 
> I think we should support this because `KafkaAvroSerializer` is very common 
> in Kafka.
> and someone met same question in stackoverflow[1].
> [[1]https://stackoverflow.com/questions/56452571/caused-by-org-apache-avro-avroruntimeexception-malformed-data-length-is-negat/56478259|https://stackoverflow.com/questions/56452571/caused-by-org-apache-avro-avroruntimeexception-malformed-data-length-is-negat/56478259]
> *The format details*
> _The factory identifier (or format id)_
> There are 2 candidates now ~
> - {{avro-sr}}: the pattern borrowed from KSQL {{JSON_SR}} format [1]
> - {{avro-confluent}}: the pattern borrowed from Clickhouse {{AvroConfluent}} 
> [2]
> Personally i would prefer {{avro-sr}} because it is more concise and the 
> confluent is a company name which i think is not that suitable for a format 
> name.
> _The format attributes_
> || Options || required || Remark ||
> | schema-registry.url | true | URL to connect to schema registry service |
> | schema-registry.subject | false | Subject name to write to the Schema 
> Registry service, required for sink |



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-16048) Support read/write confluent schema registry avro data from Kafka

Reply via email to