Thanks Bowen: +1 for this. And +1 to Kurt's suggestion. My other points are:
1.Hive built-in functions is an intermediate solution. So we should not introduce interfaces to influence the framework. To make Flink itself more powerful, we should implement the functions we need to add. 2.Non-flink built-in functions are easy for users to change their behavior. If we support some flink built-in functions in the future but act differently from non-flink built-in, this will lead to changes in user behavior. 3.Fallback to Non-flink built-in functions is a bad choice to performance. Without flink internal codegen and data format, and bring data format conversion, the performance is not so good. We need to support more complete hive jobs now, we need to have this fallback strategy. But it's not worth adding this concept at the catalog interface level, and it's not worth encouraging other catalogs to do so. Another question is, does this fallback include all hive built-in functions? As far as I know, some hive functions have some hacky. If possible, can we start with a white list? Once we implement some functions to flink built-in, we can also update the whitelist. Best, Jingsong Lee ------------------------------------------------------------------ From:Kurt Young <ykt...@gmail.com> Send Time:2019年9月3日(星期二) 15:41 To:dev <dev@flink.apache.org> Subject:Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog Thanks Bowen for driving this. +1 for the general idea. It makes the function resolved behavior more clear and deterministic. Besides, the user can use all hive built-in functions, which is a great feature. I only have one comment, but maybe it may touch your design so I think it would make sense to reply this mail instead of comment on google doc. Regarding to the classfication of functions, you currently have 4 types of functions, which are: 1. temporary functions 2. Flink built-in functions 3. Hive built-in functions (or generalized as external built-in functions) 4. catalog functions What I want to propose is we can merge #3 and #4, make them both under "catalog" concept, by extending catalog function to make it have ability to have built-in catalog functions. Some benefits I can see from this approach: 1. We don't have to introduce new concept like external built-in functions. Actually I don't see a full story about how to treat a built-in functions, and it seems a little bit disrupt with catalog. As a result, you have to make some restriction like "hive built-in functions can only be used when current catalog is hive catalog". 2. It makes us easier to adopt another system's built-in functions to Flink, such as MySQL. If we don't treat uniformly with "external built-in functions" and "external catalog function", things like user set current catalog to hive but want to use MySQL's built-in function will happen. One more thing, follow this approach, it's clear for your question about how to support external built-in functions, which is "add a getBuiltInFunction to current Catalog API". What do you think? Best, Kurt On Fri, Aug 30, 2019 at 7:14 AM Bowen Li <bowenl...@gmail.com> wrote: > Thanks everyone for the feedback. > > I have updated the document accordingly. Here're the summary of changes: > > - clarify the concept of temporary functions, to facilitate deciding > function resolution order > - provide two options to support Hive built-in functions, with the 2nd one > being preferred > - add detailed prototype code for FunctionCatalog#lookupFunction(name) > - move the section of ”rename existing FunctionCatalog APIs in favor of > temporary functions“ out of the scope of the FLIP > - add another reasonable limitation for function resolution, to not > consider resolving overloaded functions - those with the same name but > different params. (It's still valid to have a single function with > overloaded eval() methods) > > Please take another look. > > Thanks, > Bowen > > On Tue, Aug 27, 2019 at 11:49 AM Bowen Li <bowenl...@gmail.com> wrote: > > > Hi folks, > > > > I'd like to kick off a discussion on reworking Flink's FunctionCatalog. > > It's critically helpful to improve function usability in SQL. > > > > > > > https://docs.google.com/document/d/1w3HZGj9kry4RsKVCduWp82HkW6hhgi2unnvOAUS72t8/edit?usp=sharing > > > > In short, it: > > - adds support for precise function reference with fully/partially > > qualified name > > - redefines function resolution order for ambiguous function reference > > - adds support for Hive's rich built-in functions (support for Hive user > > defined functions was already added in 1.9.0) > > - clarifies the concept of temporary functions > > > > Would love to hear your thoughts. > > > > Bowen > > >