Thanks Artsem for looking into this problem and Thanks Dawid for bringing up the discussion on FLIP-30.
We've observe similar scenarios when we also would like to reuse the schema registry of both Kafka stream as well as the raw ingested kafka messages in datalake. FYI another more catalog-oriented document can be found here [1]. I do have one question to follow up with Dawid's point (2): are we suggesting that different kafka topics (e.g. test-topic-prod, test-topic-non-prod, etc) considered as a "view" of a logical table with schema (e.g. test-topic) ? Also, seems like a few of the FLIPs, like the FLIP-30 page is not linked in the main FLIP confluence wiki page [2] for some reason. I tried to fix that be seems like I don't have permission. Maybe someone can also take a look? Thanks, Rong [1] https://docs.google.com/document/d/1Y9it78yaUvbv4g572ZK_lZnZaAGjqwM_EhjdOv4yJtw/edit#heading=h.xp424vn7ioei [2] https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals On Wed, Apr 17, 2019 at 2:30 AM Artsem Semianenka <artfulonl...@gmail.com> wrote: > Thank you, Dawid! > This is very helpful information. I will keep a close eye on the updates of > FLIP-30 and contribute whenever it possible. > I guess I may create a Jira ticket for my proposal in which I describe the > idea and attach intermediate pull request based on current API(just for > initial discuss). But the final pull request definitely will be based on > FLIP-30 API. > > Best regards, > Artsem > > On Wed, 17 Apr 2019 at 09:36, Dawid Wysakowicz <dwysakow...@apache.org> > wrote: > > > Hi Artsem, > > > > I think it totally makes sense to have a catalog for the Schema > > Registry. It is also good to hear you want to contribute that. There is > > few important things to consider though: > > > > 1. The Catalog interface is currently under rework. You make take a look > > at the corresponding FLIP-30[1], and also have a look at the first PR > > that introduces the basic interfaces[2]. I think it would be worth to > > already consider those changes. I cc Xuefu who is participating in the > > efforts of Catalog integration. > > > > 2. There is still ongoing discussion about what properties should we > > store for streaming tables and how. I think this might affect (but maybe > > doesn't have to) the design of the Catalog.[3] I cc Timo who might give > > more insights if those should be blocking for the work around this > Catalog. > > > > Best, > > > > Dawid > > > > [1] > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-30%3A+Unified+Catalog+APIs > > > > [2] https://github.com/apache/flink/pull/8007 > > > > [3] > > > > > https://docs.google.com/document/d/1Yaxp1UJUFW-peGLt8EIidwKIZEWrrA-pznWLuvaH39Y/edit#heading=h.egn858cgizao > > > > On 16/04/2019 17:35, Artsem Semianenka wrote: > > > Hi guys! > > > > > > I'm working on External Catalog for Confluent Kafka. The main idea to > > > register the external catalog which provides the list of Kafka topics > and > > > execute SQL queries like : > > > Select * form kafka.topic_name > > > > > > I'm going to receive the table schema from Confluent schema registry. > The > > > main disadvantage is: we should have the topic name with the same name > > > (prefix and postfix are accepted ) as this schema subject in Schema > > > Registry. > > > For example : > > > topic: test-topic-prod > > > schema subject: test-topic > > > > > > I would like to contribute this solution into the main Flink branch and > > > would like to discuss the pros and cons of this approach. > > > > > > Best regards, > > > Artsem > > > > > > > > > -- > > С уважением, > Артем Семененко >