Thanks a lot, Timo and Xuefu. Yes, I think we can finalize the design doc first and start implementation w/o the unified connector API ready by skipping some featue.
Xuefu, I like the idea of making Flink specific properties into generic key-value pairs, so that it will make integration with Hive DDL (or others, e.g. Beam DDL) easier. I'll run a final pass over the design doc and finalize the design in the next few days. And we can start creating tasks and collaborate on the implementation. Thanks a lot for all the comments and inputs. Cheers! Shuyi On Tue, Nov 27, 2018 at 7:02 AM Zhang, Xuefu <xuef...@alibaba-inc.com> wrote: > Yeah! I agree with Timo that DDL can actually proceed w/o being blocked by > connector API. We can leave the unknown out while defining the basic syntax. > > @Shuyi > > As commented in the doc, I think we can probably stick with simple syntax > with general properties, without extending the syntax too much that it > mimics the descriptor API. > > Part of our effort on Flink-Hive integration is also to make DDL syntax > compatible with Hive's. The one in the current proposal seems making our > effort more challenging. > > We can help and collaborate. At this moment, I think we can finalize on > the proposal and then we can divide the tasks for better collaboration. > > Please let me know if there are any questions or suggestions. > > Thanks, > Xuefu > > > > > ------------------------------------------------------------------ > Sender:Timo Walther <twal...@apache.org> > Sent at:2018 Nov 27 (Tue) 16:21 > Recipient:dev <dev@flink.apache.org> > Subject:Re: [DISCUSS] Flink SQL DDL Design > > Thanks for offering your help here, Xuefu. It would be great to move > these efforts forward. I agree that the DDL is somehow releated to the > unified connector API design but we can also start with the basic > functionality now and evolve the DDL during this release and next releases. > > For example, we could identify the MVP DDL syntax that skips defining > key constraints and maybe even time attributes. This DDL could be used > for batch usecases, ETL, and materializing SQL queries (no time > operations like windows). > > The unified connector API is high on our priority list for the 1.8 > release. I will try to update the document until mid of next week. > > > Regards, > > Timo > > > Am 27.11.18 um 08:08 schrieb Shuyi Chen: > > Thanks a lot, Xuefu. I was busy for some other stuff for the last 2 > weeks, > > but we are definitely interested in moving this forward. I think once the > > unified connector API design [1] is done, we can finalize the DDL design > as > > well and start creating concrete subtasks to collaborate on the > > implementation with the community. > > > > Shuyi > > > > [1] > > > https://docs.google.com/document/d/1Yaxp1UJUFW-peGLt8EIidwKIZEWrrA-pznWLuvaH39Y/edit?usp=sharing > > > > On Mon, Nov 26, 2018 at 7:01 PM Zhang, Xuefu <xuef...@alibaba-inc.com> > > wrote: > > > >> Hi Shuyi, > >> > >> I'm wondering if you folks still have the bandwidth working on this. > >> > >> We have some dedicated resource and like to move this forward. We can > >> collaborate. > >> > >> Thanks, > >> > >> Xuefu > >> > >> > >> ------------------------------------------------------------------ > >> 发件人:wenlong.lwl<wenlong88....@gmail.com> > >> 日 期:2018年11月05日 11:15:35 > >> 收件人:<dev@flink.apache.org> > >> 主 题:Re: [DISCUSS] Flink SQL DDL Design > >> > >> Hi, Shuyi, thanks for the proposal. > >> > >> I have two concerns about the table ddl: > >> > >> 1. how about remove the source/sink mark from the ddl, because it is not > >> necessary, the framework determine the table referred is a source or a > sink > >> according to the context of the query using the table. it will be more > >> convenient for use defining a table which can be both a source and sink, > >> and more convenient for catalog to persistent and manage the meta infos. > >> > >> 2. how about just keeping one pure string map as parameters for table, > like > >> create tabe Kafka10SourceTable ( > >> intField INTEGER, > >> stringField VARCHAR(128), > >> longField BIGINT, > >> rowTimeField TIMESTAMP > >> ) with ( > >> connector.type = ’kafka’, > >> connector.property-version = ’1’, > >> connector.version = ’0.10’, > >> connector.properties.topic = ‘test-kafka-topic’, > >> connector.properties.startup-mode = ‘latest-offset’, > >> connector.properties.specific-offset = ‘offset’, > >> format.type = 'json' > >> format.prperties.version=’1’, > >> format.derive-schema = 'true' > >> ); > >> Because: > >> 1. in TableFactory, what user use is a string map properties, defining > >> parameters by string-map can be the closest way to mapping how user use > the > >> parameters. > >> 2. The table descriptor can be extended by user, like what is done in > Kafka > >> and Json, it means that the parameter keys in connector or format can be > >> different in different implementation, we can not restrict the key in a > >> specified set, so we need a map in connector scope and a map in > >> connector.properties scope. why not just give user a single map, let > them > >> put parameters in a format they like, which is also the simplest way to > >> implement DDL parser. > >> 3. whether we can define a format clause or not, depends on the > >> implementation of the connector, using different clause in DDL may make > a > >> misunderstanding that we can combine the connectors with arbitrary > formats, > >> which may not work actually. > >> > >> On Sun, 4 Nov 2018 at 18:25, Dominik Wosiński <wos...@gmail.com> wrote: > >> > >>> +1, Thanks for the proposal. > >>> > >>> I guess this is a long-awaited change. This can vastly increase the > >>> functionalities of the SQL Client as it will be possible to use complex > >>> extensions like for example those provided by Apache Bahir[1]. > >>> > >>> Best Regards, > >>> Dom. > >>> > >>> [1] > >>> https://github.com/apache/bahir-flink > >>> > >>> sob., 3 lis 2018 o 17:17 Rong Rong <walter...@gmail.com> napisał(a): > >>> > >>>> +1. Thanks for putting the proposal together Shuyi. > >>>> > >>>> DDL has been brought up in a couple of times previously [1,2]. > >> Utilizing > >>>> DDL will definitely be a great extension to the current Flink SQL to > >>>> systematically support some of the previously brought up features such > >> as > >>>> [3]. And it will also be beneficial to see the document closely > aligned > >>>> with the previous discussion for unified SQL connector API [4]. > >>>> > >>>> I also left a few comments on the doc. Looking forward to the > alignment > >>>> with the other couple of efforts and contributing to them! > >>>> > >>>> Best, > >>>> Rong > >>>> > >>>> [1] > >>>> > >>>> > >> > http://mail-archives.apache.org/mod_mbox/flink-dev/201805.mbox/%3CCAMZk55ZTJA7MkCK1Qu4gLPu1P9neqCfHZtTcgLfrFjfO4Xv5YQ%40mail.gmail.com%3E > >>>> [2] > >>>> > >>>> > >> > http://mail-archives.apache.org/mod_mbox/flink-dev/201810.mbox/%3CDC070534-0782-4AFD-8A85-8A82B384B8F7%40gmail.com%3E > >>>> [3] https://issues.apache.org/jira/browse/FLINK-8003 > >>>> [4] > >>>> > >>>> > >> > http://mail-archives.apache.org/mod_mbox/flink-dev/201810.mbox/%3c6676cb66-6f31-23e1-eff5-2e9c19f88...@apache.org%3E > >>>> > >>>> On Fri, Nov 2, 2018 at 10:22 AM Bowen Li <bowenl...@gmail.com> wrote: > >>>> > >>>>> Thanks Shuyi! > >>>>> > >>>>> I left some comments there. I think the design of SQL DDL and > >>> Flink-Hive > >>>>> integration/External catalog enhancements will work closely with each > >>>>> other. Hope we are well aligned on the directions of the two designs, > >>>> and I > >>>>> look forward to working with you guys on both! > >>>>> > >>>>> Bowen > >>>>> > >>>>> > >>>>> On Thu, Nov 1, 2018 at 10:57 PM Shuyi Chen <suez1...@gmail.com> > >> wrote: > >>>>>> Hi everyone, > >>>>>> > >>>>>> SQL DDL support has been a long-time ask from the community. > >> Current > >>>>> Flink > >>>>>> SQL support only DML (e.g. SELECT and INSERT statements). In its > >>>> current > >>>>>> form, Flink SQL users still need to define/create table sources and > >>>> sinks > >>>>>> programmatically in Java/Scala. Also, in SQL Client, without DDL > >>>> support, > >>>>>> the current implementation does not allow dynamical creation of > >>> table, > >>>>> type > >>>>>> or functions with SQL, this adds friction for its adoption. > >>>>>> > >>>>>> I drafted a design doc [1] with a few other community members that > >>>>> proposes > >>>>>> the design and implementation for adding DDL support in Flink. The > >>>>> initial > >>>>>> design considers DDL for table, view, type, library and function. > >> It > >>>> will > >>>>>> be great to get feedback on the design from the community, and > >> align > >>>> with > >>>>>> latest effort in unified SQL connector API [2] and Flink Hive > >>>>> integration > >>>>>> [3]. > >>>>>> > >>>>>> Any feedback is highly appreciated. > >>>>>> > >>>>>> Thanks > >>>>>> Shuyi Chen > >>>>>> > >>>>>> [1] > >>>>>> > >>>>>> > >> > https://docs.google.com/document/d/1TTP-GCC8wSsibJaSUyFZ_5NBAHYEB1FVmPpP7RgDGBA/edit?usp=sharing > >>>>>> [2] > >>>>>> > >>>>>> > >> > https://docs.google.com/document/d/1Yaxp1UJUFW-peGLt8EIidwKIZEWrrA-pznWLuvaH39Y/edit?usp=sharing > >>>>>> [3] > >>>>>> > >>>>>> > >> > https://docs.google.com/document/d/1SkppRD_rE3uOKSN-LuZCqn4f7dz0zW5aa6T_hBZq5_o/edit?usp=sharing > >>>>>> -- > >>>>>> "So you have to trust that the dots will somehow connect in your > >>>> future." > >> -- "So you have to trust that the dots will somehow connect in your future."