Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

JingsongLee Tue, 03 Sep 2019 01:36:30 -0700

Thanks Bowen:

+1 for this. And +1 to Kurt's suggestion. My other points are:

1.Hive built-in functions is an intermediate solution. So we should
 not introduce interfaces to influence the framework. To make
 Flink itself more powerful, we should implement the functions
 we need to add.

2.Non-flink built-in functions are easy for users to change their 
behavior. If we support some flink built-in functions in the
 future but act differently from non-flink built-in, this will lead to
 changes in user behavior.

3.Fallback to Non-flink built-in functions is a bad choice to
 performance. Without flink internal codegen and data format,
 and bring data format conversion, the performance is not so
 good.

We need to support more complete hive jobs now, we need to
 have this fallback strategy. But it's not worth adding this
 concept at the catalog interface level, and it's not worth
 encouraging other catalogs to do so.

Another question is, does this fallback include all
 hive built-in functions? As far as I know, some hive functions
 have some hacky. If possible, can we start with a white list?
Once we implement some functions to flink built-in, we can
also update the whitelist.

Best,
Jingsong Lee

------------------------------------------------------------------
From:Kurt Young <[email protected]>
Send Time:2019年9月3日(星期二) 15:41
To:dev <[email protected]>
Subject:Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

Thanks Bowen for driving this.

+1 for the general idea. It makes the function resolved behavior more
clear and deterministic. Besides, the user can use all hive built-in
functions, which is a great feature.

I only have one comment, but maybe it may touch your design so I think
it would make sense to reply this mail instead of comment on google doc.
Regarding to the classfication of functions, you currently have 4 types
of functions, which are:
1. temporary functions
2. Flink built-in functions
3. Hive built-in functions (or generalized as external built-in functions)
4. catalog functions

What I want to propose is we can merge #3 and #4, make them both under
"catalog" concept, by extending catalog function to make it have ability to
have built-in catalog functions. Some benefits I can see from this approach:
1. We don't have to introduce new concept like external built-in functions.
Actually
I don't see a full story about how to treat a built-in functions, and it
seems a little
bit disrupt with catalog. As a result, you have to make some restriction
like "hive
built-in functions can only be used when current catalog is hive catalog".

2. It makes us easier to adopt another system's built-in functions to
Flink, such as
MySQL. If we don't treat uniformly with  "external built-in functions" and
"external
catalog function", things like user set current catalog to hive but want to
use MySQL's
built-in function will happen.

One more thing, follow this approach, it's clear for your question about
how to support
external built-in functions, which is "add a  getBuiltInFunction to current
Catalog API".

What do you think?

Best,
Kurt

On Fri, Aug 30, 2019 at 7:14 AM Bowen Li <[email protected]> wrote:

> Thanks everyone for the feedback.
>
> I have updated the document accordingly. Here're the summary of changes:
>
> - clarify the concept of temporary functions, to facilitate deciding
> function resolution order
> - provide two options to support Hive built-in functions, with the 2nd one
> being preferred
> - add detailed prototype code for FunctionCatalog#lookupFunction(name)
> - move the section of ”rename existing FunctionCatalog APIs in favor of
> temporary functions“ out of the scope of the FLIP
> - add another reasonable limitation for function resolution, to not
> consider resolving overloaded functions - those with the same name but
> different params. (It's still valid to have a single function with
> overloaded eval() methods)
>
> Please take another look.
>
> Thanks,
> Bowen
>
> On Tue, Aug 27, 2019 at 11:49 AM Bowen Li <[email protected]> wrote:
>
> > Hi folks,
> >
> > I'd like to kick off a discussion on reworking Flink's FunctionCatalog.
> > It's critically helpful to improve function usability in SQL.
> >
> >
> >
> https://docs.google.com/document/d/1w3HZGj9kry4RsKVCduWp82HkW6hhgi2unnvOAUS72t8/edit?usp=sharing
> >
> > In short, it:
> > - adds support for precise function reference with fully/partially
> > qualified name
> > - redefines function resolution order for ambiguous function reference
> > - adds support for Hive's rich built-in functions (support for Hive user
> > defined functions was already added in 1.9.0)
> > - clarifies the concept of temporary functions
> >
> > Would love to hear your thoughts.
> >
> > Bowen
> >
>

Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

Reply via email to