Great! I linked that JIRA to FLINK-11275 <https://issues.apache.org/jira/browse/FLINK-11275> , and put it along with JIRAs for HiveCatalog and GenericHiveMetastoreCatalog.
I have some initial thoughts on the solution you described, but I'll wait till a more complete google design doc comes up, since this discussion is about engaging community interest. On Thu, Apr 18, 2019 at 8:39 AM Artsem Semianenka <[email protected]> wrote: > Sorry guys I've attached the wrong link for Jira ticket in the > previous email. This is the correct link : > https://issues.apache.org/jira/browse/FLINK-12256 > > On Thu, 18 Apr 2019 at 18:29, Artsem Semianenka <[email protected]> > wrote: > > > Thank you guys so much! > > > > You provided me a lot of helpful information. > > I've created the Jira ticket[1] and added to it an initial description > > only with the main purpose of the new feature. More detailed > implementation > > description will be added further. > > > > Hi Rong, to tell the truth, my first idea was to use some predefined > > prefix/postfix for topic name and lookup mapping between > > topic/schema-subject. But the idea with a separated view of a logical > > table with schema looks more elegant and flexible. > > > > Also, I thought about other approaches on how to define the mapping > > between topic and schema-subject in case if they have different names: > > Define the "subject" as a part of the table definition: > > > > Select * from kafka.topic.subject > > or > > Select * from kafka.topic#subject > > > > In case if the subject is not defined try to find a subject with the same > > name as a topic. > > If the subject still not found - take one last message and try to infer > > the schema ( retrieve schema id from the message and get last defined > > schema) > > > > But I see one disadvantage for all of these approaches: the subject name > > may contain not supported in SQL symbols. > > > > I try to investigate how to escape the illegal symbols in the table name > > definition. > > > > Thanks, > > Artsem > > > > [1] https://issues.apache.org/jira/browse/FLINK-11275 > > > > On Thu, 18 Apr 2019 at 11:54, Timo Walther <[email protected]> wrote: > > > >> Hi Artsem, > >> > >> having a catalog support for Confluent Schema Registry would be a great > >> addition. Although the implementation of FLIP-30 is still ongoing, we > >> merged the stable interfaces today [0]. This should unblock people from > >> contributing new catalog implementations. So you could already start > >> designing an implementation. The implementation could be unit tested for > >> now until it can also be registered in a table environment for > >> integration tests/end-to-end tests. > >> > >> I hope we can reuse the existing SQL Kafka connector and SQL Avro > format? > >> > >> Looking forward to a JIRA issue and a little design document how to > >> connect the APIs. > >> > >> Thanks, > >> Timo > >> > >> [0] https://github.com/apache/flink/pull/8007 > >> > >> Am 18.04.19 um 07:03 schrieb Bowen Li: > >> > Hi, > >> > > >> > Thanks Artsem and Rong for bringing up the demand from user > >> perspective. A > >> > Kafka/Confluent Schema Registry catalog would have a good use case in > >> > Flink. We actually mentioned the potential of Unified Catalog APIs for > >> > Kafka in our talk a couple weeks ago at Flink Forward SF [1], and glad > >> to > >> > learn you are interested in contributing. I think creating a JIRA > ticket > >> > with link in FLINK-11275 [2], and starting with discussions and design > >> > would help to advance the effort. > >> > > >> > The most interesting part of Confluent Schema Registry, from my point > of > >> > view, is the core idea of smoothing real production experience and > >> things > >> > built around it, including versioned schemas, schema evolution and > >> > compatibility checks, etc. Introducing a confluent-schema-registry > >> backed > >> > catalog to Flink may also help our design to benefit from those ideas. > >> > > >> > To add on Dawid's points. I assume the MVP for this project would be > >> > supporting Kafka as streaming tables thru the new catalog. FLIP-30 is > >> for > >> > both streaming and batch tables, thus it won't be blocked by the whole > >> > FLIP-30. I think as soon as we finish the table operation APIs, > finalize > >> > properties and formats, and connect the APIs to Calcite, this work can > >> be > >> > unblocked. Timo and Xuefu may have more things to say. > >> > > >> > [1] > >> > > >> > https://www.slideshare.net/BowenLi9/integrating-flink-with-hive-flink-forward-sf-2019/23 > >> > [2] https://issues.apache.org/jira/browse/FLINK-11275 > >> > > >> > On Wed, Apr 17, 2019 at 6:39 PM Jark Wu <[email protected]> wrote: > >> > > >> >> Hi Rong, > >> >> > >> >> Thanks for pointing out the missing FLIPs in the FLIP main page. I > >> added > >> >> all the missing FLIP (incl. FLIP-14, FLIP-22, FLIP-29, FLIP-30, > >> FLIP-31) to > >> >> the page. > >> >> > >> >> I also include @[email protected] <[email protected]> > >> and @Bowen > >> >> Li <[email protected]> into the thread who are familiar with the > >> >> latest catalog design. > >> >> > >> >> Thanks, > >> >> Jark > >> >> > >> >> On Thu, 18 Apr 2019 at 02:39, Rong Rong <[email protected]> wrote: > >> >> > >> >>> Thanks Artsem for looking into this problem and Thanks Dawid for > >> bringing > >> >>> up the discussion on FLIP-30. > >> >>> > >> >>> We've observe similar scenarios when we also would like to reuse the > >> >>> schema > >> >>> registry of both Kafka stream as well as the raw ingested kafka > >> messages > >> >>> in > >> >>> datalake. > >> >>> FYI another more catalog-oriented document can be found here [1]. I > do > >> >>> have > >> >>> one question to follow up with Dawid's point (2): are we suggesting > >> that > >> >>> different kafka topics (e.g. test-topic-prod, test-topic-non-prod, > >> etc) > >> >>> considered as a "view" of a logical table with schema (e.g. > >> test-topic) ? > >> >>> > >> >>> Also, seems like a few of the FLIPs, like the FLIP-30 page is not > >> linked > >> >>> in > >> >>> the main FLIP confluence wiki page [2] for some reason. > >> >>> I tried to fix that be seems like I don't have permission. Maybe > >> someone > >> >>> can also take a look? > >> >>> > >> >>> Thanks, > >> >>> Rong > >> >>> > >> >>> > >> >>> [1] > >> >>> > >> >>> > >> > https://docs.google.com/document/d/1Y9it78yaUvbv4g572ZK_lZnZaAGjqwM_EhjdOv4yJtw/edit#heading=h.xp424vn7ioei > >> >>> [2] > >> >>> > >> >>> > >> > https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals > >> >>> > >> >>> On Wed, Apr 17, 2019 at 2:30 AM Artsem Semianenka < > >> [email protected] > >> >>> wrote: > >> >>> > >> >>>> Thank you, Dawid! > >> >>>> This is very helpful information. I will keep a close eye on the > >> >>> updates of > >> >>>> FLIP-30 and contribute whenever it possible. > >> >>>> I guess I may create a Jira ticket for my proposal in which I > >> describe > >> >>> the > >> >>>> idea and attach intermediate pull request based on current API(just > >> for > >> >>>> initial discuss). But the final pull request definitely will be > >> based on > >> >>>> FLIP-30 API. > >> >>>> > >> >>>> Best regards, > >> >>>> Artsem > >> >>>> > >> >>>> On Wed, 17 Apr 2019 at 09:36, Dawid Wysakowicz < > >> [email protected]> > >> >>>> wrote: > >> >>>> > >> >>>>> Hi Artsem, > >> >>>>> > >> >>>>> I think it totally makes sense to have a catalog for the Schema > >> >>>>> Registry. It is also good to hear you want to contribute that. > There > >> >>> is > >> >>>>> few important things to consider though: > >> >>>>> > >> >>>>> 1. The Catalog interface is currently under rework. You make take > a > >> >>> look > >> >>>>> at the corresponding FLIP-30[1], and also have a look at the first > >> PR > >> >>>>> that introduces the basic interfaces[2]. I think it would be worth > >> to > >> >>>>> already consider those changes. I cc Xuefu who is participating in > >> the > >> >>>>> efforts of Catalog integration. > >> >>>>> > >> >>>>> 2. There is still ongoing discussion about what properties should > we > >> >>>>> store for streaming tables and how. I think this might affect (but > >> >>> maybe > >> >>>>> doesn't have to) the design of the Catalog.[3] I cc Timo who might > >> >>> give > >> >>>>> more insights if those should be blocking for the work around this > >> >>>> Catalog. > >> >>>>> Best, > >> >>>>> > >> >>>>> Dawid > >> >>>>> > >> >>>>> [1] > >> >>>>> > >> >>>>> > >> >>> > >> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-30%3A+Unified+Catalog+APIs > >> >>>>> [2] https://github.com/apache/flink/pull/8007 > >> >>>>> > >> >>>>> [3] > >> >>>>> > >> >>>>> > >> >>> > >> > https://docs.google.com/document/d/1Yaxp1UJUFW-peGLt8EIidwKIZEWrrA-pznWLuvaH39Y/edit#heading=h.egn858cgizao > >> >>>>> On 16/04/2019 17:35, Artsem Semianenka wrote: > >> >>>>>> Hi guys! > >> >>>>>> > >> >>>>>> I'm working on External Catalog for Confluent Kafka. The main > idea > >> >>> to > >> >>>>>> register the external catalog which provides the list of Kafka > >> >>> topics > >> >>>> and > >> >>>>>> execute SQL queries like : > >> >>>>>> Select * form kafka.topic_name > >> >>>>>> > >> >>>>>> I'm going to receive the table schema from Confluent schema > >> >>> registry. > >> >>>> The > >> >>>>>> main disadvantage is: we should have the topic name with the same > >> >>> name > >> >>>>>> (prefix and postfix are accepted ) as this schema subject in > Schema > >> >>>>>> Registry. > >> >>>>>> For example : > >> >>>>>> topic: test-topic-prod > >> >>>>>> schema subject: test-topic > >> >>>>>> > >> >>>>>> I would like to contribute this solution into the main Flink > branch > >> >>> and > >> >>>>>> would like to discuss the pros and cons of this approach. > >> >>>>>> > >> >>>>>> Best regards, > >> >>>>>> Artsem > >> >>>>>> > >> >>>>> > >> >>>> -- > >> >>>> > >> >>>> С уважением, > >> >>>> Артем Семененко > >> >>>> > >> > >> > > > > -- > > > > С уважением, > > Артем Семененко > > > > > -- > > С уважением, > Артем Семененко >
