Thanks for the summary Bowen!

Looks good to me.

Cheers,
Fabian

Am Mo., 30. Sept. 2019 um 23:24 Uhr schrieb Bowen Li <bowenl...@gmail.com>:

> Hi all,
>
> I've updated the FLIP wiki with the following changes:
>
> - Lifespan of temp functions are not tied to those of catalogs and
> databases. Users can create temp functions even though catalogs/dbs in
> their fully qualified names don't even exist.
> - some new SQL commands
>     - "SHOW FUNCTIONS" - list names of temp and non-temp system/built-in
> functions, and names of temp and catalog functions in the current catalog
> and db
>     - "SHOW ALL FUNCTIONS" - list names of temp and non-temp system/built
> functions, and fully qualified names of temp and catalog functions in all
> catalogs and dbs
>     - "SHOW ALL TEMPORARY FUNCTIONS" - list fully qualified names of temp
> functions in all catalog and db
>     - "SHOW ALL TEMPORARY SYSTEM FUNCTIONS" - list names of all temp system
> functions
>
> Let me know if you have any questions.
>
> Seems we have resolved all concerns. If there's no more ones, I'd like to
> close the vote by this time tomorrow.
>
> Cheers,
> Bowen
>
> On Mon, Sep 30, 2019 at 11:59 AM Bowen Li <bowenl...@gmail.com> wrote:
>
> > Hi,
> >
> > I think above are some valid points, and we can adopt the suggestions.
> >
> > To elaborate a bit on the new SQL syntax, it would imply that, unlike
> > "SHOW FUNCTION" which only return function names, "SHOW ALL [TEMPORARY]
> > FUNCTIONS" would return functions' fully qualified names with catalog and
> > db names.
> >
> >
> >
> > On Mon, Sep 30, 2019 at 6:38 AM Timo Walther <twal...@apache.org> wrote:
> >
> >> Hi all,
> >>
> >> I support Fabian's arguments. In my opinion, temporary objects should
> >> just be an additional layer on top of the regular catalog/database
> >> lookup logic. Thus, a temporary table or function has always highest
> >> precedence and should be stable within the local session. Otherwise it
> >> could magically disappear while someone else is performing modifications
> >> in the catalog.
> >>
> >> Furthermore, this feature is very useful for prototyping as users can
> >> simply express that a catalog/database is present even through they
> >> might not have access to it currently.
> >>
> >> Regards,
> >> Timo
> >>
> >>
> >> On 30.09.19 14:57, Fabian Hueske wrote:
> >> > Hi all,
> >> >
> >> > Sorry for the late reply.
> >> >
> >> > I think it might lead to confusing situations if temporary functions
> (or
> >> > any temporary db objects for that matter) are bound to the life cycle
> >> of an
> >> > (external) db/catalog.
> >> > Imaging a situation where you create a temp function in a db in an
> >> external
> >> > catalog and use it but at some point it does not work anymore because
> >> some
> >> > other dropped the database from the external catalog.
> >> > Shouldn't temporary objects be only controlled by the owner of a
> >> session?
> >> >
> >> > I agree that creating temp objects in non-existing db/catalogs sounds
> a
> >> bit
> >> > strange, but IMO the opposite (the db/catalog must exist for a temp
> >> > function to be created/exist) can have significant implications like
> the
> >> > one I described.
> >> > I think it would be quite easy for users to understand that temporary
> >> > objects are solely owned by them (and their session).
> >> > The problem of listing temporary objects could be solved by adding a
> ALL
> >> > [TEMPORARY] clause:
> >> >
> >> > SHOW ALL FUNCTIONS; could show all functions regardless of the
> >> > catalog/database including temporary functions.
> >> > SHOW ALL TEMPORARY FUNCTIONS; could show all temporary functions
> >> regardless
> >> > of the catalog/database.
> >> >
> >> > Best,
> >> > Fabian
> >> >
> >> > Am Sa., 28. Sept. 2019 um 02:21 Uhr schrieb Bowen Li <
> >> bowenl...@gmail.com>:
> >> >
> >> >> @Dawid, do you have any other concerns? If not, I hope we can close
> the
> >> >> voting.
> >> >>
> >> >>
> >> >> On Thu, Sep 26, 2019 at 8:14 PM Rui Li <lirui.fu...@gmail.com>
> wrote:
> >> >>
> >> >>> I'm not sure how much benefit #1 can bring us. If users just want to
> >> try
> >> >>> out temporary functions, they can create temporary system functions
> >> which
> >> >>> don't require a catalog/DB. IIUC, the main reason why we allow
> >> temporary
> >> >>> catalog function is to let users override permanent catalog
> functions.
> >> >>> Therefore a temporary function in a non-existing catalog won't serve
> >> that
> >> >>> purpose. Besides, each session is provided with a default catalog
> and
> >> DB.
> >> >>> So even if users simply want to create some catalog functions they
> can
> >> >>> forget about after the session, wouldn't the default catalog/DB be
> >> enough
> >> >>> for such experiments?
> >> >>>
> >> >>> On Thu, Sep 26, 2019 at 4:38 AM Bowen Li <bowenl...@gmail.com>
> wrote:
> >> >>>
> >> >>>> Re 1) As described in the FLIP, a temp function lookup will first
> >> make
> >> >>> sure
> >> >>>> the db exists. If the db doesn't exist, a lazy drop is triggered to
> >> >>> remove
> >> >>>> that temp function.
> >> >>>>
> >> >>>> I agree Hive doesn't handle it consistently, and we are not copying
> >> >> Hive.
> >> >>>> IMHO, allowing registering temp functions in nonexistent catalog/db
> >> is
> >> >>>> hacky and problematic. For instance, "SHOW FUNCTIONS" would list
> >> system
> >> >>>> functions and functions in the current catalog/db, since users
> cannot
> >> >>>> designate a nonexistent catalog/db as current ones, how can they
> list
> >> >>>> functions in nonexistent catalog/db? They may end up never knowing
> >> what
> >> >>>> temp functions they've created unless trying out with queries or we
> >> >>>> introducing some more nonstandard SQL statements. The same applies
> to
> >> >>> other
> >> >>>> temp objects like temp tables.
> >> >>>>
> >> >>>> Re 2) A standalone FunctionIdentifier sounds good to me
> >> >>>>
> >> >>>> On Wed, Sep 25, 2019 at 4:46 AM Dawid Wysakowicz <
> >> >>>> wysakowicz.da...@gmail.com>
> >> >>>> wrote:
> >> >>>>
> >> >>>>> Ad. 1
> >> >>>>> I wouldn't say it is hacky.
> >> >>>>> Moreover, how do you want ensure that the dB always exists when a
> >> >>>> temporary
> >> >>>>> object is used?( in this particular case function). Do you want to
> >> >>> query
> >> >>>>> for the database existence whenever e.g a temporary function is
> >> >> used? I
> >> >>>>> think important aspect here is that the database can be dropped
> from
> >> >>>>> external system, not just flink or a different flink session.
> >> >>>>>
> >> >>>>> E.g in case of hive, you cannot create a temporary table in a
> >> >> database
> >> >>>> that
> >> >>>>> does not exist, that's true. But if you create a temporary table
> in
> >> a
> >> >>>>> database and drop that database from a different session, you can
> >> >> still
> >> >>>>> query the previously created temporary table from the original
> >> >> session.
> >> >>>> It
> >> >>>>> does not sound like a consistent behaviour to me. Why don't we
> make
> >> >>> this
> >> >>>>> behaviour of not binding a temporary objects to the lifetime of a
> >> >>>> database
> >> >>>>> explicit as part of the temporary objects contract? In the end
> they
> >> >>> exist
> >> >>>>> in different layers. Permanent objects & databases in a catalog
> (in
> >> >>> case
> >> >>>> of
> >> >>>>> hive megastore) whereas temporary objects in flink sessions.
> That's
> >> >>> also
> >> >>>>> true for the original hive client. The temporary objects live in
> the
> >> >>> hive
> >> >>>>> client whereas databases are created in the metastore.
> >> >>>>>
> >> >>>>> Ad.2
> >> >>>>> I'm open for suggestions here. The one thing I wanted to achieve
> >> here
> >> >>> is
> >> >>>> so
> >> >>>>> that we do not change the contract of ObjectIdentifier. One
> >> important
> >> >>>> thing
> >> >>>>> to remember here is that we need the function identifier to be
> part
> >> >> of
> >> >>>> the
> >> >>>>> FunctionDefinition object and not only as the result of the
> function
> >> >>>>> lookup. At some point we want to be able to store QueryOperations
> in
> >> >>> the
> >> >>>>> catalogs. They can contain function calls within which we need to
> >> >> have
> >> >>>> the
> >> >>>>> identifier.
> >> >>>>>
> >> >>>>> I agree my initial suggestion is over complicated. How about we
> have
> >> >>> just
> >> >>>>> the FunctionIdentifier as top level class without making the
> >> >>>>> ObjectIdentifier extend from it? I think it's pretty much the same
> >> >> what
> >> >>>> you
> >> >>>>> suggested. The only difference is that it would be a top level
> class
> >> >>>> with a
> >> >>>>> more descriptive name.
> >> >>>>>
> >> >>>>>
> >> >>>>> On Wed, 25 Sep 2019, 13:57 Bowen Li, <bowenl...@gmail.com> wrote:
> >> >>>>>
> >> >>>>>> Sorry, I missed some parts of the solution. The complete
> >> >> alternative
> >> >>> is
> >> >>>>> the
> >> >>>>>> following, basically having separate APIs in FunctionLookup for
> >> >>>> ambiguous
> >> >>>>>> and precise function lookup since planner is able to tell which
> API
> >> >>> to
> >> >>>>> call
> >> >>>>>> with parsed queries, and have a unified result:
> >> >>>>>>
> >> >>>>>> ```
> >> >>>>>> class FunctionLookup {
> >> >>>>>>
> >> >>>>>> Optional<Result> lookupAmbiguousFunction(String name);
> >> >>>>>>
> >> >>>>>>
> >> >>>>>> Optional<Result> lookupPreciseFunction(ObjectIdentifier oi);
> >> >>>>>>
> >> >>>>>>
> >> >>>>>> class Result {
> >> >>>>>>      Optional<ObjectIdentifier>  getObjectIdentifier() { ... }
> >> >>>>>>      String getName() { ... }
> >> >>>>>>      // ...
> >> >>>>>> }
> >> >>>>>>
> >> >>>>>> }
> >> >>>>>> ```
> >> >>>>>>
> >> >>>>>> Thanks,
> >> >>>>>> Bowen
> >> >>>>>>
> >> >>>>>>
> >> >>>>>> On Tue, Sep 24, 2019 at 9:42 PM Bowen Li <bowenl...@gmail.com>
> >> >>> wrote:
> >> >>>>>>> Hi Dawid,
> >> >>>>>>>
> >> >>>>>>> Re 1): I agree making it easy for users to run experiments is
> >> >>>>> important.
> >> >>>>>>> However, I'm not sure allowing users to register temp functions
> >> >> in
> >> >>>>>>> nonexistent catalog/db is the optimal way. It seems a bit hacky,
> >> >>> and
> >> >>>>>> breaks
> >> >>>>>>> the current contract between Flink and users that catalog/db
> must
> >> >>> be
> >> >>>>>> valid
> >> >>>>>>> in order to operate on.
> >> >>>>>>>
> >> >>>>>>> How about we instead focus on making it convenient to create
> >> >>>> catalogs?
> >> >>>>>>> Users actually can already do it with ease via program or SQL
> CLI
> >> >>>> yaml
> >> >>>>>> file
> >> >>>>>>> for an in-memory catalog which has neither extra dependency nor
> >> >>>>> external
> >> >>>>>>> connections. What we can further improve is DDL for catalogs,
> >> >> and I
> >> >>>>>> raised
> >> >>>>>>> it in discussion of [FLIP 69 - Flink SQL DDL Enhancement] driven
> >> >> by
> >> >>>>> Terry
> >> >>>>>>> now.
> >> >>>>>>>
> >> >>>>>>> In that case, if users would like to experiment via SQL, they
> can
> >> >>>>> easily
> >> >>>>>>> create an in memory catalog/database using DDL, then play with
> >> >> temp
> >> >>>>>>> functions.
> >> >>>>>>>
> >> >>>>>>> Re 2): For the assumption, IIUIC, function ObjectIdentifier has
> >> >> not
> >> >>>>> been
> >> >>>>>>> resolved when stack call reaches
> >> >> FunctionCatalog#lookupFunction(),
> >> >>>> but
> >> >>>>>> only
> >> >>>>>>> been parsed?
> >> >>>>>>>
> >> >>>>>>> I agree keeping ObjectIdentifier as-is would be good. I'm ok
> with
> >> >>> the
> >> >>>>>>> suggested classes, though making ObjectIdentifier a subclass of
> >> >>>>>>> FunctionIdentifier seem a bit counter intuitive.
> >> >>>>>>>
> >> >>>>>>> Another potentially simpler way is:
> >> >>>>>>>
> >> >>>>>>> ```
> >> >>>>>>> // in class FunctionLookup
> >> >>>>>>> class Result {
> >> >>>>>>>      Optional<ObjectIdentifier>  getObjectIdentifier() { ... }
> >> >>>>>>>      String getName() { ... }
> >> >>>>>>>      ...
> >> >>>>>>> }
> >> >>>>>>> ```
> >> >>>>>>>
> >> >>>>>>> WDYT?
> >> >>>>>>>
> >> >>>>>>>
> >> >>>>>>>
> >> >>>>>>> On Tue, Sep 24, 2019 at 3:41 PM Dawid Wysakowicz <
> >> >>>>>>> wysakowicz.da...@gmail.com> wrote:
> >> >>>>>>>
> >> >>>>>>>> Hi,
> >> >>>>>>>> I really like the flip and think it clarifies important aspects
> >> >> of
> >> >>>> the
> >> >>>>>>>> system.
> >> >>>>>>>>
> >> >>>>>>>> I have two, I hope small suggestions, which will not take much
> >> >>> time
> >> >>>> to
> >> >>>>>>>> agree on.
> >> >>>>>>>>
> >> >>>>>>>> 1. Could we follow the MySQL approach in regards to the
> >> >> existence
> >> >>> of
> >> >>>>>>>> cat/db
> >> >>>>>>>> for temporary functions? That means not to check it, so e.g.
> >> >> it's
> >> >>>>>> possible
> >> >>>>>>>> to create a temporary function in a database that does not
> >> >> exist.
> >> >>> I
> >> >>>>>> think
> >> >>>>>>>> it's really useful e.g in cases when user wants to perform
> >> >>>> experiments
> >> >>>>>> but
> >> >>>>>>>> does not have access to the db yet or temporarily does not have
> >> >>>>>> connection
> >> >>>>>>>> to a catalog.
> >> >>>>>>>> 2. Could we not change the ObjectIdentifier? Could we not
> loosen
> >> >>> the
> >> >>>>>>>> requirements for all catalog objects such as tables, views,
> >> >> types
> >> >>>> just
> >> >>>>>> for
> >> >>>>>>>> the functions? It's really important later on from e.g the
> >> >>>>>> serializability
> >> >>>>>>>> perspective. The important aspect of the ObjectIdentifier is
> >> >> that
> >> >>> we
> >> >>>>>> know
> >> >>>>>>>> that it has been resolved. The suggested changes break that
> >> >>>>> assumption.
> >> >>>>>>>> What do you think about adding an interface FunctionIdentifier
> {
> >> >>>>>>>>
> >> >>>>>>>> String getName();
> >> >>>>>>>>
> >> >>>>>>>> /**
> >> >>>>>>>>    Return 3-part identifier. Empty in case of a built-in
> >> >> function.
> >> >>>>>>>> */
> >> >>>>>>>> Optional<ObjectIdentifier> getObjectIdentifier()
> >> >>>>>>>> }
> >> >>>>>>>>
> >> >>>>>>>> class ObjectIdentifier implements FunctionIdentifier {
> >> >>>>>>>> Optional<ObjectIdentifier> getObjectIdentifier() {
> >> >>>>>>>>   return Optional.of(this);
> >> >>>>>>>> }
> >> >>>>>>>> }
> >> >>>>>>>>
> >> >>>>>>>> class SystemFunctionIdentifier implements FunctionIdentifier
> >> >> {...}
> >> >>>>>>>> WDYT?
> >> >>>>>>>>
> >> >>>>>>>> On Wed, 25 Sep 2019, 04:50 Xuefu Z, <usxu...@gmail.com> wrote:
> >> >>>>>>>>
> >> >>>>>>>>> +1. LGTM
> >> >>>>>>>>>
> >> >>>>>>>>> On Tue, Sep 24, 2019 at 6:09 AM Terry Wang <
> >> >> zjuwa...@gmail.com>
> >> >>>>>> wrote:
> >> >>>>>>>>>> +1
> >> >>>>>>>>>>
> >> >>>>>>>>>> Best,
> >> >>>>>>>>>> Terry Wang
> >> >>>>>>>>>>
> >> >>>>>>>>>>
> >> >>>>>>>>>>
> >> >>>>>>>>>>> 在 2019年9月24日,上午10:42,Kurt Young <ykt...@gmail.com> 写道:
> >> >>>>>>>>>>>
> >> >>>>>>>>>>> +1
> >> >>>>>>>>>>>
> >> >>>>>>>>>>> Best,
> >> >>>>>>>>>>> Kurt
> >> >>>>>>>>>>>
> >> >>>>>>>>>>>
> >> >>>>>>>>>>> On Tue, Sep 24, 2019 at 2:30 AM Bowen Li <
> >> >>> bowenl...@gmail.com
> >> >>>>>>>> wrote:
> >> >>>>>>>>>>>> Hi all,
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>> I'd like to start a voting thread for FLIP-57 [1], which
> >> >>>> we've
> >> >>>>>>>> reached
> >> >>>>>>>>>>>> consensus in [2].
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>> This voting will be open for minimum 3 days till 6:30pm
> >> >>> UTC,
> >> >>>>> Sep
> >> >>>>>>>> 26.
> >> >>>>>>>>>>>> Thanks,
> >> >>>>>>>>>>>> Bowen
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>> [1]
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>>
> >> >>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-57%3A+Rework+FunctionCatalog
> >> >>>>>>>>>>>> [2]
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>>
> >> >>
> >>
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-57-Rework-FunctionCatalog-td32291.html#a32613
> >> >>>>>>>>>>
> >> >>>>>>>>> --
> >> >>>>>>>>> Xuefu Zhang
> >> >>>>>>>>>
> >> >>>>>>>>> "In Honey We Trust!"
> >> >>>>>>>>>
> >> >>>
> >> >>> --
> >> >>> Best regards!
> >> >>> Rui Li
> >> >>>
> >>
> >>
>

Reply via email to