Thanks for the clarification Timo, that's sounds good to me. Let's continue to discuss other things.
Best, Kurt On Thu, Oct 10, 2019 at 4:14 PM Timo Walther <twal...@apache.org> wrote: > The purpose of ConnectTableDescriptor was to have a programmatic way of > expressing `CREATE TABLE` with JavaDocs and IDE support. But I agree > that it needs some rework in the future, once we have finalized the DDL > to ensure that both concepts are in sync. > > Regards, > Timo > > > On 10.10.19 16:08, Kurt Young wrote: > > Regarding to ConnectTableDescriptor, if in the end it becomes a builder > > of CatalogTable, I would be ok with that. But it doesn't look like a > builder > > now, it's more like another form of TableSource/TableSink. > > > > Best, > > Kurt > > > > > > On Thu, Oct 10, 2019 at 3:44 PM Timo Walther <twal...@apache.org> wrote: > > > >> Hi everyone, > >> > >> thanks for the great discussion with a good outcome. > >> > >> I have one last comment about: > >> > >> void createTemporaryFunction(String path, UserDefinedFunction function); > >> > >> We should encourage users to register a class instead of an instance. We > >> should enforce a default constructor in the future. If we need metadata > >> or type inference, the planner can simply instantiate the class and call > >> the corresponding getters. If people would like to parameterize a > >> function, they can use global job parameters instead - which are > >> available in the open() method. > >> > >> I suggest: > >> > >> void createTemporaryFunction(String path, Class<? extends > >> UserDefinedFunction> functionClass); > >> > >> This is also closer to the SQL DDL that is about to be implemented in: > >> https://issues.apache.org/jira/browse/FLINK-7151 > >> > >> Thanks, > >> Timo > >> > >> > >> On 09.10.19 17:07, Bowen Li wrote: > >>> Hi Dawid, > >>> > >>> +1 for proposed changes > >>> > >>> On Wed, Oct 9, 2019 at 12:15 PM Dawid Wysakowicz < > dwysakow...@apache.org > >>> > >>> wrote: > >>> > >>>> Sorry for a very delayed response. > >>>> > >>>> @Kurt Yes, this is the goal to have a function created like new > >>>> Function(...) also be wrapped into CatalogFunction. This would have to > >>>> be though a temporary function as we cannot represent that as a set of > >>>> properties. Similar to the createTemporaryView(DataStream stream). > >>>> > >>>> As for the ConnectTableDescriptor I agree this is very similar to > >>>> CatalogTable. I am not sure though if we should get rid of it. In the > >>>> end I see it as a builder for a CatalogTable, which is a slightly more > >>>> internal API, but we might revisit that some time in the future if we > >>>> find that it makes more sense. > >>>> > >>>> @All I updated the FLIP page with some more details from the outcome > of > >>>> the discussions around FLIP-57. Please take a look. I would like to > >>>> start a vote on this FLIP as soon as the vote on FLIP-57 goes through. > >>>> > >>>> Best, > >>>> > >>>> Dawid > >>>> > >>>> > >>>> On 19/09/2019 09:24, Kurt Young wrote: > >>>>> IIUC it's good to see that both serializable (tables description from > >>>> DDL) > >>>>> and unserializable (tables with DataStream underneath) tables are > >> treated > >>>>> unify with CatalogTable. > >>>>> > >>>>> Can I also assume functions that either come from a function class > >> (from > >>>>> DDL) > >>>>> or function objects (newed by user) will also treated unify with > >>>>> CatalogFunction? > >>>>> > >>>>> This will greatly simplify and unify current API level concepts and > >>>> design. > >>>>> And it seems only one thing left, how do we deal with > >>>>> ConnectTableDescriptor? > >>>>> It's actually very similar with serializable CatalogTable, both carry > >>>> some > >>>>> text > >>>>> properties which even are the same. Is there any chance we can > further > >>>> unify > >>>>> this to CatalogTable? > >>>>> > >>>>> object > >>>>> Best, > >>>>> Kurt > >>>>> > >>>>> > >>>>> On Thu, Sep 19, 2019 at 3:13 PM Jark Wu <imj...@gmail.com> wrote: > >>>>> > >>>>>> Thanks Dawid for the design doc. > >>>>>> > >>>>>> In general, I’m +1 to the FLIP. > >>>>>> > >>>>>> > >>>>>> +1 to the single-string and parse way to express object path. > >>>>>> > >>>>>> +1 to deprecate registerTableSink & registerTableSource. > >>>>>> But I would suggest to provide an easy way to register a custom > >>>>>> source/sink before we drop them (this is another story). > >>>>>> Currently, it’s not easy to implement a custom connector descriptor. > >>>>>> > >>>>>> Best, > >>>>>> Jark > >>>>>> > >>>>>> > >>>>>>> 在 2019年9月19日,11:37,Dawid Wysakowicz <wysakowicz.da...@gmail.com> > 写道: > >>>>>>> > >>>>>>> Hi JingsongLee, > >>>>>>> From my understanding they can. Underneath they will be > >> CatalogTables. > >>>>>> The > >>>>>>> difference is the lifetime of the tables. Plus some of the user > >> facing > >>>>>>> interfaces cannot be persisted e.g. datastream. Therefore we must > >> have > >>>> a > >>>>>>> separate methods for that. In the end the temporary tables are held > >> in > >>>>>>> memory as CatalogTables. > >>>>>>> Best, > >>>>>>> Dawid > >>>>>>> > >>>>>>> On Thu, 19 Sep 2019, 10:08 JingsongLee, <lzljs3620...@aliyun.com > >>>>>> .invalid> > >>>>>>> wrote: > >>>>>>> > >>>>>>>> Hi dawid: > >>>>>>>> Can temporary tables achieve the same capabilities as catalog > table? > >>>>>>>> like statistics: CatalogTableStatistics, CatalogColumnStatistics, > >>>>>>>> PartitionStatistics > >>>>>>>> like partition support: we have added some catalog equivalent > >>>> interfaces > >>>>>>>> on TableSource/TableSink: getPartitions, getPartitionFieldNames > >>>>>>>> Maybe it's not a good idea to add these interfaces to > >>>>>>>> TableSource/TableSink. What do you think? > >>>>>>>> > >>>>>>>> Best, > >>>>>>>> Jingsong Lee > >>>>>>>> > >>>>>>>> > >>>>>>>> ------------------------------------------------------------------ > >>>>>>>> From:Kurt Young <ykt...@gmail.com> > >>>>>>>> Send Time:2019年9月18日(星期三) 17:54 > >>>>>>>> To:dev <dev@flink.apache.org> > >>>>>>>> Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in > >> Table > >>>>>>>> module > >>>>>>>> > >>>>>>>> Hi all, > >>>>>>>> > >>>>>>>> Sorry to join this party late. Big +1 to this flip, especially for > >> the > >>>>>>>> dropping > >>>>>>>> "registerTableSink & registerTableSource" part. These are indeed > >>>> legacy > >>>>>>>> and we should try to unify them through CatalogTable after we > >>>> introduce > >>>>>>>> the concept of Catalog. > >>>>>>>> > >>>>>>>> From my understanding, what we can registered should all be > >> metadata, > >>>>>>>> TableSource/TableSink should only be the one who is responsible to > >> do > >>>>>>>> the real work, i.e. reading and writing data according to the > schema > >>>> and > >>>>>>>> other information like computed column, partition, .e.g. > >>>>>>>> > >>>>>>>> Best, > >>>>>>>> Kurt > >>>>>>>> > >>>>>>>> > >>>>>>>> On Wed, Sep 18, 2019 at 5:14 PM JingsongLee < > >> lzljs3620...@aliyun.com > >>>>>>>> .invalid> > >>>>>>>> wrote: > >>>>>>>> > >>>>>>>>> After some development and thinking, I have a general > >> understanding. > >>>>>>>>> +1 to registering a source/sink does not fit into the SQL world. > >>>>>>>>> I am OK to have a deprecated registerTemporarySource/Sink to > >>>> compatible > >>>>>>>>> with old ways. > >>>>>>>>> > >>>>>>>>> Best, > >>>>>>>>> Jingsong Lee > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > ------------------------------------------------------------------ > >>>>>>>>> From:Timo Walther <twal...@apache.org> > >>>>>>>>> Send Time:2019年9月17日(星期二) 08:00 > >>>>>>>>> To:dev <dev@flink.apache.org> > >>>>>>>>> Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in > >> Table > >>>>>>>>> module > >>>>>>>>> > >>>>>>>>> Hi Dawid, > >>>>>>>>> > >>>>>>>>> thanks for the design document. It fixes big concept gaps due to > >>>>>>>>> historical reasons with proper support for serializability and > >>>> catalog > >>>>>>>>> support in mind. > >>>>>>>>> > >>>>>>>>> I would not mind a registerTemporarySource/Sink, but the problem > >>>> that I > >>>>>>>>> see is that many people think that this is the recommended way of > >>>>>>>>> registering a table source/sink which is not true. We should > guide > >>>>>> users > >>>>>>>>> to either use connect() or DDL API which can be validated and > >> stored > >>>> in > >>>>>>>>> catalog. > >>>>>>>>> > >>>>>>>>> Also from a concept perspective, registering a source/sink does > not > >>>> fit > >>>>>>>>> into the SQL world. SQL does not know about source/sinks but only > >>>> about > >>>>>>>>> tables. If the responsibility of a TableSource/TableSink is just > a > >>>> pure > >>>>>>>>> physical data consumer/producer that is not connected to the > actual > >>>>>>>>> logical table schema, we would need a possibility of defining > time > >>>>>>>>> attributes and interpreting/converting a changelog. This should > be > >>>> done > >>>>>>>>> by the framework with information from the DDL/connect() and not > be > >>>>>>>>> defined in every table source. > >>>>>>>>> > >>>>>>>>> Regards, > >>>>>>>>> Timo > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> On 09.09.19 14:16, JingsongLee wrote: > >>>>>>>>>> Hi dawid: > >>>>>>>>>> > >>>>>>>>>> It is difficult to describe specific examples. > >>>>>>>>>> Sometimes users will generate some java converters through some > >>>>>>>>>> Java code, or generate some Java classes through third-party > >>>>>>>>>> libraries. Of course, these can be best done through > properties. > >>>>>>>>>> But this requires additional work from users.My suggestion is to > >>>>>>>>>> keep this Java instance class way that is user-friendly. > >>>>>>>>>> > >>>>>>>>>> Best, > >>>>>>>>>> Jingsong Lee > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > ------------------------------------------------------------------ > >>>>>>>>>> From:Dawid Wysakowicz <dwysakow...@apache.org> > >>>>>>>>>> Send Time:2019年9月6日(星期五) 16:21 > >>>>>>>>>> To:dev <dev@flink.apache.org> > >>>>>>>>>> Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in > >>>> Table > >>>>>>>>> module > >>>>>>>>>> Hi all, > >>>>>>>>>> @Jingsong Could you elaborate a bit more what do you mean by > >>>>>>>>>> "some Connectors are difficult to convert all states to > >> properties" > >>>>>>>>>> All the Flink provided connectors will definitely be expressible > >>>> with > >>>>>>>>> properties (In the end you should be able to use them from DDL). > I > >>>>>> think > >>>>>>>> if > >>>>>>>>> a TableSource is complex enough that it handles filter push down, > >>>>>>>> partition > >>>>>>>>> support etc. should rather be made available both from DDL & > >>>> java/scala > >>>>>>>>> code. I'm happy to reconsider adding > registerTemporaryTable(String > >>>>>> path, > >>>>>>>>> TableSource source) if you have some concrete examples in mind. > >>>>>>>>>> @Xuefu: We also considered the ObjectIdentifier (or actually > >>>>>>>> introducing > >>>>>>>>> a new identifier representation to differentiate between resolved > >> and > >>>>>>>>> unresolved identifiers) with the same concerns. We decided to > >> suggest > >>>>>> the > >>>>>>>>> string & parsing logic because of usability. > >>>>>>>>>> tEnv.from("cat.db.table") > >>>>>>>>>> is shorter and easier to write than > >>>>>>>>>> tEnv.from(Identifier.for("cat", "db", "name") > >>>>>>>>>> And also implicitly solves the problem what happens if a user > >> (e.g. > >>>>>>>> used > >>>>>>>>> to other systems) uses that API in a following manner: > >>>>>>>>>> tEnv.from(Identifier.for("db.name") > >>>>>>>>>> I'm happy to revisit it if the general consensus is that it's > >> better > >>>>>> to > >>>>>>>>> use the OO aproach. > >>>>>>>>>> Best, > >>>>>>>>>> Dawid > >>>>>>>>>> > >>>>>>>>>> On 06/09/2019 10:00, Xuefu Z wrote: > >>>>>>>>>> > >>>>>>>>>> Thanks to Dawid for starting the discussion and writeup. It > looks > >>>>>>>> pretty > >>>>>>>>>> good to me except that I'm a little concerned about the object > >>>>>>>> reference > >>>>>>>>>> and string parsing in the code, which seems to an anti-pattern > to > >>>> OOP. > >>>>>>>>> Have > >>>>>>>>>> we considered using ObjectIdenitifier with optional catalog and > db > >>>>>>>> parts, > >>>>>>>>>> esp. if we are worried about arguments of variable length or > >> method > >>>>>>>>>> overloading? It's quite likely that the result of string parsing > >> is > >>>> an > >>>>>>>>>> ObjectIdentifier instance any way. > >>>>>>>>>> > >>>>>>>>>> Having string parsing logic in the code is a little dangerous as > >> it > >>>>>>>>>> duplicates part of the DDL/DML parsing, and they can easily get > >> out > >>>> of > >>>>>>>>> sync. > >>>>>>>>>> Thanks, > >>>>>>>>>> Xuefu > >>>>>>>>>> > >>>>>>>>>> On Fri, Sep 6, 2019 at 1:57 PM JingsongLee < > >> lzljs3620...@aliyun.com > >>>>>>>>> .invalid> > >>>>>>>>>> wrote: > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> Thanks dawid, +1 for this approach. > >>>>>>>>>> > >>>>>>>>>> One concern is the removal of registerTableSink & > >>>> registerTableSource > >>>>>>>>>> in TableEnvironment. It has two alternatives: > >>>>>>>>>> 1.the properties approach (DDL, descriptor). > >>>>>>>>>> 2.from/toDataStream. > >>>>>>>>>> > >>>>>>>>>> #1 can only be properties, not java states, and some Connectors > >>>>>>>>>> are difficult to convert all states to properties. > >>>>>>>>>> #2 can contain java state. But can't use TableSource-related > >>>> features, > >>>>>>>>>> like project & filter push down, partition support, etc.. > >>>>>>>>>> > >>>>>>>>>> Any idea about this? > >>>>>>>>>> > >>>>>>>>>> Best, > >>>>>>>>>> Jingsong Lee > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > ------------------------------------------------------------------ > >>>>>>>>>> From:Dawid Wysakowicz <dwysakow...@apache.org> > >>>>>>>>>> Send Time:2019年9月4日(星期三) 22:20 > >>>>>>>>>> To:dev <dev@flink.apache.org> > >>>>>>>>>> Subject:[DISCUSS] FLIP-64: Support for Temporary Objects in > Table > >>>>>>>> module > >>>>>>>>>> Hi all, > >>>>>>>>>> As part of FLIP-30 a Catalog API was introduced that enables > >> storing > >>>>>>>>> table > >>>>>>>>>> meta objects permanently. At the same time the majority of > current > >>>>>> APIs > >>>>>>>>>> create temporary objects that cannot be serialized. We should > >>>> clarify > >>>>>>>> the > >>>>>>>>>> creation of meta objects (tables, views, functions) in a unified > >>>> way. > >>>>>>>>>> Another current problem in the API is that all the temporary > >> objects > >>>>>>>> are > >>>>>>>>>> stored in a special built-in catalog, which is not very > intuitive > >>>> for > >>>>>>>>> many > >>>>>>>>>> users, as they must be aware of that catalog to reference > >> temporary > >>>>>>>>> objects. > >>>>>>>>>> Lastly, different APIs have different ways of providing object > >>>> paths: > >>>>>>>>>> String path…, > >>>>>>>>>> String path, String pathContinued… > >>>>>>>>>> String name > >>>>>>>>>> We should choose one approach and unify it across all APIs. > >>>>>>>>>> I suggest a FLIP to address the above issues. > >>>>>>>>>> Looking forward to your opinions. > >>>>>>>>>> FLIP link: > >>>>>>>>>> > >> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module > >> > >