[jira] [Comment Edited] (FLINK-9679) Implement ConfluentRegistryAvroSerializationSchema

Dawid Wysakowicz (JIRA) Mon, 06 May 2019 00:55:37 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-9679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16833569#comment-16833569
 ]


Dawid Wysakowicz edited comment on FLINK-9679 at 5/6/19 7:54 AM:
-----------------------------------------------------------------

Hi [~phoenixjiangnan]

*DISCLAIMER* I have not read the design document for [FLINK-12256] thoroughly, 
just to the extent to get the overall idea.

I think it is somehow related and it could be used in most of the cases there 
are some caveats though. I think the idea behind the `SerializationSchema` was 
to always write the schema based on the topic that the record was actually 
written to (that was the only strategy when the PR was opened). Another problem 
I see is that this would imply that the `SerializationSchema` would bypass the 
`Catalog` interface for creating new entries in the Schema registry which I 
think is wrong.

I agree we should probably synchronize this effort with the work around Schema 
Registry Catalog. What I would like to see in the document for Registry Catalog 
is more in-depth discussion about the mapping between {{topic <> subject}} both 
in case of reading and writing. Some problems that we should solve from top of 
my head:

* what happens if the id in the record does not correspond to the subject name 
used from catalog, and how do we check for that
* which part is responsible for creating entries in the catalog
* how do we store information is the stream is append stream or a changelog
* how do we define schema for key and value of a Kafka message?


was (Author: dawidwys):
Hi [~phoenixjiangnan]

*DISCLAIMER* I have not read the design document for [FLINK-12256] thoroughly, 
just to get the overall idea.

I think it is somehow related and it could be used in most of the cases there 
are some caveats though. I think the idea behind the `SerializationSchema` was 
to always write the schema based on the topic that the record was actually 
written to (that was the only strategy when the PR was opened). Another problem 
I see is that this would imply that the `SerializationSchema` would bypass the 
`Catalog` interface for creating new entries in the Schema registry which I 
think is wrong.

I agree we should probably synchronize this effort with the work around Schema 
Registry Catalog. What I would like to see in the document for Registry Catalog 
is more in-depth discussion about the mapping between {{topic <> subject}} both 
in case of reading and writing. Some problems that we should solve from top of 
my head:

* what happens if the id in the record does not correspond to the subject name 
used from catalog, and how do we check for that
* which part is responsible for creating entries in the catalog
* how do we store information is the stream is append stream or a changelog
* how do we define schema for key and value of a Kafka message?

> Implement ConfluentRegistryAvroSerializationSchema
> --------------------------------------------------
>
>                 Key: FLINK-9679
>                 URL: https://issues.apache.org/jira/browse/FLINK-9679
>             Project: Flink
>          Issue Type: Improvement
>          Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
>    Affects Versions: 1.6.0
>            Reporter: Yazdan Shirvany
>            Assignee: Dominik Wosiński
>            Priority: Major
>              Labels: pull-request-available
>
> Implement AvroSerializationSchema using Confluent Schema Registry



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Comment Edited] (FLINK-9679) Implement ConfluentRegistryAvroSerializationSchema

Reply via email to