Thanks for the summary Bowen! Looks good to me.
Cheers, Fabian Am Mo., 30. Sept. 2019 um 23:24 Uhr schrieb Bowen Li <bowenl...@gmail.com>: > Hi all, > > I've updated the FLIP wiki with the following changes: > > - Lifespan of temp functions are not tied to those of catalogs and > databases. Users can create temp functions even though catalogs/dbs in > their fully qualified names don't even exist. > - some new SQL commands > - "SHOW FUNCTIONS" - list names of temp and non-temp system/built-in > functions, and names of temp and catalog functions in the current catalog > and db > - "SHOW ALL FUNCTIONS" - list names of temp and non-temp system/built > functions, and fully qualified names of temp and catalog functions in all > catalogs and dbs > - "SHOW ALL TEMPORARY FUNCTIONS" - list fully qualified names of temp > functions in all catalog and db > - "SHOW ALL TEMPORARY SYSTEM FUNCTIONS" - list names of all temp system > functions > > Let me know if you have any questions. > > Seems we have resolved all concerns. If there's no more ones, I'd like to > close the vote by this time tomorrow. > > Cheers, > Bowen > > On Mon, Sep 30, 2019 at 11:59 AM Bowen Li <bowenl...@gmail.com> wrote: > > > Hi, > > > > I think above are some valid points, and we can adopt the suggestions. > > > > To elaborate a bit on the new SQL syntax, it would imply that, unlike > > "SHOW FUNCTION" which only return function names, "SHOW ALL [TEMPORARY] > > FUNCTIONS" would return functions' fully qualified names with catalog and > > db names. > > > > > > > > On Mon, Sep 30, 2019 at 6:38 AM Timo Walther <twal...@apache.org> wrote: > > > >> Hi all, > >> > >> I support Fabian's arguments. In my opinion, temporary objects should > >> just be an additional layer on top of the regular catalog/database > >> lookup logic. Thus, a temporary table or function has always highest > >> precedence and should be stable within the local session. Otherwise it > >> could magically disappear while someone else is performing modifications > >> in the catalog. > >> > >> Furthermore, this feature is very useful for prototyping as users can > >> simply express that a catalog/database is present even through they > >> might not have access to it currently. > >> > >> Regards, > >> Timo > >> > >> > >> On 30.09.19 14:57, Fabian Hueske wrote: > >> > Hi all, > >> > > >> > Sorry for the late reply. > >> > > >> > I think it might lead to confusing situations if temporary functions > (or > >> > any temporary db objects for that matter) are bound to the life cycle > >> of an > >> > (external) db/catalog. > >> > Imaging a situation where you create a temp function in a db in an > >> external > >> > catalog and use it but at some point it does not work anymore because > >> some > >> > other dropped the database from the external catalog. > >> > Shouldn't temporary objects be only controlled by the owner of a > >> session? > >> > > >> > I agree that creating temp objects in non-existing db/catalogs sounds > a > >> bit > >> > strange, but IMO the opposite (the db/catalog must exist for a temp > >> > function to be created/exist) can have significant implications like > the > >> > one I described. > >> > I think it would be quite easy for users to understand that temporary > >> > objects are solely owned by them (and their session). > >> > The problem of listing temporary objects could be solved by adding a > ALL > >> > [TEMPORARY] clause: > >> > > >> > SHOW ALL FUNCTIONS; could show all functions regardless of the > >> > catalog/database including temporary functions. > >> > SHOW ALL TEMPORARY FUNCTIONS; could show all temporary functions > >> regardless > >> > of the catalog/database. > >> > > >> > Best, > >> > Fabian > >> > > >> > Am Sa., 28. Sept. 2019 um 02:21 Uhr schrieb Bowen Li < > >> bowenl...@gmail.com>: > >> > > >> >> @Dawid, do you have any other concerns? If not, I hope we can close > the > >> >> voting. > >> >> > >> >> > >> >> On Thu, Sep 26, 2019 at 8:14 PM Rui Li <lirui.fu...@gmail.com> > wrote: > >> >> > >> >>> I'm not sure how much benefit #1 can bring us. If users just want to > >> try > >> >>> out temporary functions, they can create temporary system functions > >> which > >> >>> don't require a catalog/DB. IIUC, the main reason why we allow > >> temporary > >> >>> catalog function is to let users override permanent catalog > functions. > >> >>> Therefore a temporary function in a non-existing catalog won't serve > >> that > >> >>> purpose. Besides, each session is provided with a default catalog > and > >> DB. > >> >>> So even if users simply want to create some catalog functions they > can > >> >>> forget about after the session, wouldn't the default catalog/DB be > >> enough > >> >>> for such experiments? > >> >>> > >> >>> On Thu, Sep 26, 2019 at 4:38 AM Bowen Li <bowenl...@gmail.com> > wrote: > >> >>> > >> >>>> Re 1) As described in the FLIP, a temp function lookup will first > >> make > >> >>> sure > >> >>>> the db exists. If the db doesn't exist, a lazy drop is triggered to > >> >>> remove > >> >>>> that temp function. > >> >>>> > >> >>>> I agree Hive doesn't handle it consistently, and we are not copying > >> >> Hive. > >> >>>> IMHO, allowing registering temp functions in nonexistent catalog/db > >> is > >> >>>> hacky and problematic. For instance, "SHOW FUNCTIONS" would list > >> system > >> >>>> functions and functions in the current catalog/db, since users > cannot > >> >>>> designate a nonexistent catalog/db as current ones, how can they > list > >> >>>> functions in nonexistent catalog/db? They may end up never knowing > >> what > >> >>>> temp functions they've created unless trying out with queries or we > >> >>>> introducing some more nonstandard SQL statements. The same applies > to > >> >>> other > >> >>>> temp objects like temp tables. > >> >>>> > >> >>>> Re 2) A standalone FunctionIdentifier sounds good to me > >> >>>> > >> >>>> On Wed, Sep 25, 2019 at 4:46 AM Dawid Wysakowicz < > >> >>>> wysakowicz.da...@gmail.com> > >> >>>> wrote: > >> >>>> > >> >>>>> Ad. 1 > >> >>>>> I wouldn't say it is hacky. > >> >>>>> Moreover, how do you want ensure that the dB always exists when a > >> >>>> temporary > >> >>>>> object is used?( in this particular case function). Do you want to > >> >>> query > >> >>>>> for the database existence whenever e.g a temporary function is > >> >> used? I > >> >>>>> think important aspect here is that the database can be dropped > from > >> >>>>> external system, not just flink or a different flink session. > >> >>>>> > >> >>>>> E.g in case of hive, you cannot create a temporary table in a > >> >> database > >> >>>> that > >> >>>>> does not exist, that's true. But if you create a temporary table > in > >> a > >> >>>>> database and drop that database from a different session, you can > >> >> still > >> >>>>> query the previously created temporary table from the original > >> >> session. > >> >>>> It > >> >>>>> does not sound like a consistent behaviour to me. Why don't we > make > >> >>> this > >> >>>>> behaviour of not binding a temporary objects to the lifetime of a > >> >>>> database > >> >>>>> explicit as part of the temporary objects contract? In the end > they > >> >>> exist > >> >>>>> in different layers. Permanent objects & databases in a catalog > (in > >> >>> case > >> >>>> of > >> >>>>> hive megastore) whereas temporary objects in flink sessions. > That's > >> >>> also > >> >>>>> true for the original hive client. The temporary objects live in > the > >> >>> hive > >> >>>>> client whereas databases are created in the metastore. > >> >>>>> > >> >>>>> Ad.2 > >> >>>>> I'm open for suggestions here. The one thing I wanted to achieve > >> here > >> >>> is > >> >>>> so > >> >>>>> that we do not change the contract of ObjectIdentifier. One > >> important > >> >>>> thing > >> >>>>> to remember here is that we need the function identifier to be > part > >> >> of > >> >>>> the > >> >>>>> FunctionDefinition object and not only as the result of the > function > >> >>>>> lookup. At some point we want to be able to store QueryOperations > in > >> >>> the > >> >>>>> catalogs. They can contain function calls within which we need to > >> >> have > >> >>>> the > >> >>>>> identifier. > >> >>>>> > >> >>>>> I agree my initial suggestion is over complicated. How about we > have > >> >>> just > >> >>>>> the FunctionIdentifier as top level class without making the > >> >>>>> ObjectIdentifier extend from it? I think it's pretty much the same > >> >> what > >> >>>> you > >> >>>>> suggested. The only difference is that it would be a top level > class > >> >>>> with a > >> >>>>> more descriptive name. > >> >>>>> > >> >>>>> > >> >>>>> On Wed, 25 Sep 2019, 13:57 Bowen Li, <bowenl...@gmail.com> wrote: > >> >>>>> > >> >>>>>> Sorry, I missed some parts of the solution. The complete > >> >> alternative > >> >>> is > >> >>>>> the > >> >>>>>> following, basically having separate APIs in FunctionLookup for > >> >>>> ambiguous > >> >>>>>> and precise function lookup since planner is able to tell which > API > >> >>> to > >> >>>>> call > >> >>>>>> with parsed queries, and have a unified result: > >> >>>>>> > >> >>>>>> ``` > >> >>>>>> class FunctionLookup { > >> >>>>>> > >> >>>>>> Optional<Result> lookupAmbiguousFunction(String name); > >> >>>>>> > >> >>>>>> > >> >>>>>> Optional<Result> lookupPreciseFunction(ObjectIdentifier oi); > >> >>>>>> > >> >>>>>> > >> >>>>>> class Result { > >> >>>>>> Optional<ObjectIdentifier> getObjectIdentifier() { ... } > >> >>>>>> String getName() { ... } > >> >>>>>> // ... > >> >>>>>> } > >> >>>>>> > >> >>>>>> } > >> >>>>>> ``` > >> >>>>>> > >> >>>>>> Thanks, > >> >>>>>> Bowen > >> >>>>>> > >> >>>>>> > >> >>>>>> On Tue, Sep 24, 2019 at 9:42 PM Bowen Li <bowenl...@gmail.com> > >> >>> wrote: > >> >>>>>>> Hi Dawid, > >> >>>>>>> > >> >>>>>>> Re 1): I agree making it easy for users to run experiments is > >> >>>>> important. > >> >>>>>>> However, I'm not sure allowing users to register temp functions > >> >> in > >> >>>>>>> nonexistent catalog/db is the optimal way. It seems a bit hacky, > >> >>> and > >> >>>>>> breaks > >> >>>>>>> the current contract between Flink and users that catalog/db > must > >> >>> be > >> >>>>>> valid > >> >>>>>>> in order to operate on. > >> >>>>>>> > >> >>>>>>> How about we instead focus on making it convenient to create > >> >>>> catalogs? > >> >>>>>>> Users actually can already do it with ease via program or SQL > CLI > >> >>>> yaml > >> >>>>>> file > >> >>>>>>> for an in-memory catalog which has neither extra dependency nor > >> >>>>> external > >> >>>>>>> connections. What we can further improve is DDL for catalogs, > >> >> and I > >> >>>>>> raised > >> >>>>>>> it in discussion of [FLIP 69 - Flink SQL DDL Enhancement] driven > >> >> by > >> >>>>> Terry > >> >>>>>>> now. > >> >>>>>>> > >> >>>>>>> In that case, if users would like to experiment via SQL, they > can > >> >>>>> easily > >> >>>>>>> create an in memory catalog/database using DDL, then play with > >> >> temp > >> >>>>>>> functions. > >> >>>>>>> > >> >>>>>>> Re 2): For the assumption, IIUIC, function ObjectIdentifier has > >> >> not > >> >>>>> been > >> >>>>>>> resolved when stack call reaches > >> >> FunctionCatalog#lookupFunction(), > >> >>>> but > >> >>>>>> only > >> >>>>>>> been parsed? > >> >>>>>>> > >> >>>>>>> I agree keeping ObjectIdentifier as-is would be good. I'm ok > with > >> >>> the > >> >>>>>>> suggested classes, though making ObjectIdentifier a subclass of > >> >>>>>>> FunctionIdentifier seem a bit counter intuitive. > >> >>>>>>> > >> >>>>>>> Another potentially simpler way is: > >> >>>>>>> > >> >>>>>>> ``` > >> >>>>>>> // in class FunctionLookup > >> >>>>>>> class Result { > >> >>>>>>> Optional<ObjectIdentifier> getObjectIdentifier() { ... } > >> >>>>>>> String getName() { ... } > >> >>>>>>> ... > >> >>>>>>> } > >> >>>>>>> ``` > >> >>>>>>> > >> >>>>>>> WDYT? > >> >>>>>>> > >> >>>>>>> > >> >>>>>>> > >> >>>>>>> On Tue, Sep 24, 2019 at 3:41 PM Dawid Wysakowicz < > >> >>>>>>> wysakowicz.da...@gmail.com> wrote: > >> >>>>>>> > >> >>>>>>>> Hi, > >> >>>>>>>> I really like the flip and think it clarifies important aspects > >> >> of > >> >>>> the > >> >>>>>>>> system. > >> >>>>>>>> > >> >>>>>>>> I have two, I hope small suggestions, which will not take much > >> >>> time > >> >>>> to > >> >>>>>>>> agree on. > >> >>>>>>>> > >> >>>>>>>> 1. Could we follow the MySQL approach in regards to the > >> >> existence > >> >>> of > >> >>>>>>>> cat/db > >> >>>>>>>> for temporary functions? That means not to check it, so e.g. > >> >> it's > >> >>>>>> possible > >> >>>>>>>> to create a temporary function in a database that does not > >> >> exist. > >> >>> I > >> >>>>>> think > >> >>>>>>>> it's really useful e.g in cases when user wants to perform > >> >>>> experiments > >> >>>>>> but > >> >>>>>>>> does not have access to the db yet or temporarily does not have > >> >>>>>> connection > >> >>>>>>>> to a catalog. > >> >>>>>>>> 2. Could we not change the ObjectIdentifier? Could we not > loosen > >> >>> the > >> >>>>>>>> requirements for all catalog objects such as tables, views, > >> >> types > >> >>>> just > >> >>>>>> for > >> >>>>>>>> the functions? It's really important later on from e.g the > >> >>>>>> serializability > >> >>>>>>>> perspective. The important aspect of the ObjectIdentifier is > >> >> that > >> >>> we > >> >>>>>> know > >> >>>>>>>> that it has been resolved. The suggested changes break that > >> >>>>> assumption. > >> >>>>>>>> What do you think about adding an interface FunctionIdentifier > { > >> >>>>>>>> > >> >>>>>>>> String getName(); > >> >>>>>>>> > >> >>>>>>>> /** > >> >>>>>>>> Return 3-part identifier. Empty in case of a built-in > >> >> function. > >> >>>>>>>> */ > >> >>>>>>>> Optional<ObjectIdentifier> getObjectIdentifier() > >> >>>>>>>> } > >> >>>>>>>> > >> >>>>>>>> class ObjectIdentifier implements FunctionIdentifier { > >> >>>>>>>> Optional<ObjectIdentifier> getObjectIdentifier() { > >> >>>>>>>> return Optional.of(this); > >> >>>>>>>> } > >> >>>>>>>> } > >> >>>>>>>> > >> >>>>>>>> class SystemFunctionIdentifier implements FunctionIdentifier > >> >> {...} > >> >>>>>>>> WDYT? > >> >>>>>>>> > >> >>>>>>>> On Wed, 25 Sep 2019, 04:50 Xuefu Z, <usxu...@gmail.com> wrote: > >> >>>>>>>> > >> >>>>>>>>> +1. LGTM > >> >>>>>>>>> > >> >>>>>>>>> On Tue, Sep 24, 2019 at 6:09 AM Terry Wang < > >> >> zjuwa...@gmail.com> > >> >>>>>> wrote: > >> >>>>>>>>>> +1 > >> >>>>>>>>>> > >> >>>>>>>>>> Best, > >> >>>>>>>>>> Terry Wang > >> >>>>>>>>>> > >> >>>>>>>>>> > >> >>>>>>>>>> > >> >>>>>>>>>>> 在 2019年9月24日,上午10:42,Kurt Young <ykt...@gmail.com> 写道: > >> >>>>>>>>>>> > >> >>>>>>>>>>> +1 > >> >>>>>>>>>>> > >> >>>>>>>>>>> Best, > >> >>>>>>>>>>> Kurt > >> >>>>>>>>>>> > >> >>>>>>>>>>> > >> >>>>>>>>>>> On Tue, Sep 24, 2019 at 2:30 AM Bowen Li < > >> >>> bowenl...@gmail.com > >> >>>>>>>> wrote: > >> >>>>>>>>>>>> Hi all, > >> >>>>>>>>>>>> > >> >>>>>>>>>>>> I'd like to start a voting thread for FLIP-57 [1], which > >> >>>> we've > >> >>>>>>>> reached > >> >>>>>>>>>>>> consensus in [2]. > >> >>>>>>>>>>>> > >> >>>>>>>>>>>> This voting will be open for minimum 3 days till 6:30pm > >> >>> UTC, > >> >>>>> Sep > >> >>>>>>>> 26. > >> >>>>>>>>>>>> Thanks, > >> >>>>>>>>>>>> Bowen > >> >>>>>>>>>>>> > >> >>>>>>>>>>>> [1] > >> >>>>>>>>>>>> > >> >>>>>>>>>>>> > >> >> > >> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-57%3A+Rework+FunctionCatalog > >> >>>>>>>>>>>> [2] > >> >>>>>>>>>>>> > >> >>>>>>>>>>>> > >> >> > >> > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-57-Rework-FunctionCatalog-td32291.html#a32613 > >> >>>>>>>>>> > >> >>>>>>>>> -- > >> >>>>>>>>> Xuefu Zhang > >> >>>>>>>>> > >> >>>>>>>>> "In Honey We Trust!" > >> >>>>>>>>> > >> >>> > >> >>> -- > >> >>> Best regards! > >> >>> Rui Li > >> >>> > >> > >> >