Hi Dawid, +1 for proposed changes
On Wed, Oct 9, 2019 at 12:15 PM Dawid Wysakowicz <dwysakow...@apache.org> wrote: > Sorry for a very delayed response. > > @Kurt Yes, this is the goal to have a function created like new > Function(...) also be wrapped into CatalogFunction. This would have to > be though a temporary function as we cannot represent that as a set of > properties. Similar to the createTemporaryView(DataStream stream). > > As for the ConnectTableDescriptor I agree this is very similar to > CatalogTable. I am not sure though if we should get rid of it. In the > end I see it as a builder for a CatalogTable, which is a slightly more > internal API, but we might revisit that some time in the future if we > find that it makes more sense. > > @All I updated the FLIP page with some more details from the outcome of > the discussions around FLIP-57. Please take a look. I would like to > start a vote on this FLIP as soon as the vote on FLIP-57 goes through. > > Best, > > Dawid > > > On 19/09/2019 09:24, Kurt Young wrote: > > IIUC it's good to see that both serializable (tables description from > DDL) > > and unserializable (tables with DataStream underneath) tables are treated > > unify with CatalogTable. > > > > Can I also assume functions that either come from a function class (from > > DDL) > > or function objects (newed by user) will also treated unify with > > CatalogFunction? > > > > This will greatly simplify and unify current API level concepts and > design. > > > > And it seems only one thing left, how do we deal with > > ConnectTableDescriptor? > > It's actually very similar with serializable CatalogTable, both carry > some > > text > > properties which even are the same. Is there any chance we can further > unify > > this to CatalogTable? > > > > object > > Best, > > Kurt > > > > > > On Thu, Sep 19, 2019 at 3:13 PM Jark Wu <imj...@gmail.com> wrote: > > > >> Thanks Dawid for the design doc. > >> > >> In general, I’m +1 to the FLIP. > >> > >> > >> +1 to the single-string and parse way to express object path. > >> > >> +1 to deprecate registerTableSink & registerTableSource. > >> But I would suggest to provide an easy way to register a custom > >> source/sink before we drop them (this is another story). > >> Currently, it’s not easy to implement a custom connector descriptor. > >> > >> Best, > >> Jark > >> > >> > >>> 在 2019年9月19日,11:37,Dawid Wysakowicz <wysakowicz.da...@gmail.com> 写道: > >>> > >>> Hi JingsongLee, > >>> From my understanding they can. Underneath they will be CatalogTables. > >> The > >>> difference is the lifetime of the tables. Plus some of the user facing > >>> interfaces cannot be persisted e.g. datastream. Therefore we must have > a > >>> separate methods for that. In the end the temporary tables are held in > >>> memory as CatalogTables. > >>> Best, > >>> Dawid > >>> > >>> On Thu, 19 Sep 2019, 10:08 JingsongLee, <lzljs3620...@aliyun.com > >> .invalid> > >>> wrote: > >>> > >>>> Hi dawid: > >>>> Can temporary tables achieve the same capabilities as catalog table? > >>>> like statistics: CatalogTableStatistics, CatalogColumnStatistics, > >>>> PartitionStatistics > >>>> like partition support: we have added some catalog equivalent > interfaces > >>>> on TableSource/TableSink: getPartitions, getPartitionFieldNames > >>>> Maybe it's not a good idea to add these interfaces to > >>>> TableSource/TableSink. What do you think? > >>>> > >>>> Best, > >>>> Jingsong Lee > >>>> > >>>> > >>>> ------------------------------------------------------------------ > >>>> From:Kurt Young <ykt...@gmail.com> > >>>> Send Time:2019年9月18日(星期三) 17:54 > >>>> To:dev <dev@flink.apache.org> > >>>> Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in Table > >>>> module > >>>> > >>>> Hi all, > >>>> > >>>> Sorry to join this party late. Big +1 to this flip, especially for the > >>>> dropping > >>>> "registerTableSink & registerTableSource" part. These are indeed > legacy > >>>> and we should try to unify them through CatalogTable after we > introduce > >>>> the concept of Catalog. > >>>> > >>>> From my understanding, what we can registered should all be metadata, > >>>> TableSource/TableSink should only be the one who is responsible to do > >>>> the real work, i.e. reading and writing data according to the schema > and > >>>> other information like computed column, partition, .e.g. > >>>> > >>>> Best, > >>>> Kurt > >>>> > >>>> > >>>> On Wed, Sep 18, 2019 at 5:14 PM JingsongLee <lzljs3620...@aliyun.com > >>>> .invalid> > >>>> wrote: > >>>> > >>>>> After some development and thinking, I have a general understanding. > >>>>> +1 to registering a source/sink does not fit into the SQL world. > >>>>> I am OK to have a deprecated registerTemporarySource/Sink to > compatible > >>>>> with old ways. > >>>>> > >>>>> Best, > >>>>> Jingsong Lee > >>>>> > >>>>> > >>>>> ------------------------------------------------------------------ > >>>>> From:Timo Walther <twal...@apache.org> > >>>>> Send Time:2019年9月17日(星期二) 08:00 > >>>>> To:dev <dev@flink.apache.org> > >>>>> Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in Table > >>>>> module > >>>>> > >>>>> Hi Dawid, > >>>>> > >>>>> thanks for the design document. It fixes big concept gaps due to > >>>>> historical reasons with proper support for serializability and > catalog > >>>>> support in mind. > >>>>> > >>>>> I would not mind a registerTemporarySource/Sink, but the problem > that I > >>>>> see is that many people think that this is the recommended way of > >>>>> registering a table source/sink which is not true. We should guide > >> users > >>>>> to either use connect() or DDL API which can be validated and stored > in > >>>>> catalog. > >>>>> > >>>>> Also from a concept perspective, registering a source/sink does not > fit > >>>>> into the SQL world. SQL does not know about source/sinks but only > about > >>>>> tables. If the responsibility of a TableSource/TableSink is just a > pure > >>>>> physical data consumer/producer that is not connected to the actual > >>>>> logical table schema, we would need a possibility of defining time > >>>>> attributes and interpreting/converting a changelog. This should be > done > >>>>> by the framework with information from the DDL/connect() and not be > >>>>> defined in every table source. > >>>>> > >>>>> Regards, > >>>>> Timo > >>>>> > >>>>> > >>>>> On 09.09.19 14:16, JingsongLee wrote: > >>>>>> Hi dawid: > >>>>>> > >>>>>> It is difficult to describe specific examples. > >>>>>> Sometimes users will generate some java converters through some > >>>>>> Java code, or generate some Java classes through third-party > >>>>>> libraries. Of course, these can be best done through properties. > >>>>>> But this requires additional work from users.My suggestion is to > >>>>>> keep this Java instance class way that is user-friendly. > >>>>>> > >>>>>> Best, > >>>>>> Jingsong Lee > >>>>>> > >>>>>> > >>>>>> ------------------------------------------------------------------ > >>>>>> From:Dawid Wysakowicz <dwysakow...@apache.org> > >>>>>> Send Time:2019年9月6日(星期五) 16:21 > >>>>>> To:dev <dev@flink.apache.org> > >>>>>> Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in > Table > >>>>> module > >>>>>> Hi all, > >>>>>> @Jingsong Could you elaborate a bit more what do you mean by > >>>>>> "some Connectors are difficult to convert all states to properties" > >>>>>> All the Flink provided connectors will definitely be expressible > with > >>>>> properties (In the end you should be able to use them from DDL). I > >> think > >>>> if > >>>>> a TableSource is complex enough that it handles filter push down, > >>>> partition > >>>>> support etc. should rather be made available both from DDL & > java/scala > >>>>> code. I'm happy to reconsider adding registerTemporaryTable(String > >> path, > >>>>> TableSource source) if you have some concrete examples in mind. > >>>>>> > >>>>>> @Xuefu: We also considered the ObjectIdentifier (or actually > >>>> introducing > >>>>> a new identifier representation to differentiate between resolved and > >>>>> unresolved identifiers) with the same concerns. We decided to suggest > >> the > >>>>> string & parsing logic because of usability. > >>>>>> tEnv.from("cat.db.table") > >>>>>> is shorter and easier to write than > >>>>>> tEnv.from(Identifier.for("cat", "db", "name") > >>>>>> And also implicitly solves the problem what happens if a user (e.g. > >>>> used > >>>>> to other systems) uses that API in a following manner: > >>>>>> tEnv.from(Identifier.for("db.name") > >>>>>> I'm happy to revisit it if the general consensus is that it's better > >> to > >>>>> use the OO aproach. > >>>>>> Best, > >>>>>> Dawid > >>>>>> > >>>>>> On 06/09/2019 10:00, Xuefu Z wrote: > >>>>>> > >>>>>> Thanks to Dawid for starting the discussion and writeup. It looks > >>>> pretty > >>>>>> good to me except that I'm a little concerned about the object > >>>> reference > >>>>>> and string parsing in the code, which seems to an anti-pattern to > OOP. > >>>>> Have > >>>>>> we considered using ObjectIdenitifier with optional catalog and db > >>>> parts, > >>>>>> esp. if we are worried about arguments of variable length or method > >>>>>> overloading? It's quite likely that the result of string parsing is > an > >>>>>> ObjectIdentifier instance any way. > >>>>>> > >>>>>> Having string parsing logic in the code is a little dangerous as it > >>>>>> duplicates part of the DDL/DML parsing, and they can easily get out > of > >>>>> sync. > >>>>>> Thanks, > >>>>>> Xuefu > >>>>>> > >>>>>> On Fri, Sep 6, 2019 at 1:57 PM JingsongLee <lzljs3620...@aliyun.com > >>>>> .invalid> > >>>>>> wrote: > >>>>>> > >>>>>> > >>>>>> Thanks dawid, +1 for this approach. > >>>>>> > >>>>>> One concern is the removal of registerTableSink & > registerTableSource > >>>>>> in TableEnvironment. It has two alternatives: > >>>>>> 1.the properties approach (DDL, descriptor). > >>>>>> 2.from/toDataStream. > >>>>>> > >>>>>> #1 can only be properties, not java states, and some Connectors > >>>>>> are difficult to convert all states to properties. > >>>>>> #2 can contain java state. But can't use TableSource-related > features, > >>>>>> like project & filter push down, partition support, etc.. > >>>>>> > >>>>>> Any idea about this? > >>>>>> > >>>>>> Best, > >>>>>> Jingsong Lee > >>>>>> > >>>>>> > >>>>>> ------------------------------------------------------------------ > >>>>>> From:Dawid Wysakowicz <dwysakow...@apache.org> > >>>>>> Send Time:2019年9月4日(星期三) 22:20 > >>>>>> To:dev <dev@flink.apache.org> > >>>>>> Subject:[DISCUSS] FLIP-64: Support for Temporary Objects in Table > >>>> module > >>>>>> Hi all, > >>>>>> As part of FLIP-30 a Catalog API was introduced that enables storing > >>>>> table > >>>>>> meta objects permanently. At the same time the majority of current > >> APIs > >>>>>> create temporary objects that cannot be serialized. We should > clarify > >>>> the > >>>>>> creation of meta objects (tables, views, functions) in a unified > way. > >>>>>> Another current problem in the API is that all the temporary objects > >>>> are > >>>>>> stored in a special built-in catalog, which is not very intuitive > for > >>>>> many > >>>>>> users, as they must be aware of that catalog to reference temporary > >>>>> objects. > >>>>>> Lastly, different APIs have different ways of providing object > paths: > >>>>>> > >>>>>> String path…, > >>>>>> String path, String pathContinued… > >>>>>> String name > >>>>>> We should choose one approach and unify it across all APIs. > >>>>>> I suggest a FLIP to address the above issues. > >>>>>> Looking forward to your opinions. > >>>>>> FLIP link: > >>>>>> > >> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module > >>>> > >> > >