Hi everyone,

sorry, for the late replay. I give also +1 for option #2. Thus, I guess we have a clear winner.

I would also like to find a better keyword/syntax for this statement. Esp. the BUILTIN keyword can confuse people, because it could be written as BUILTIN, BUILDIN, BUILT_IN, or BUILD_IN. And we would need to introduce a new reserved keyword in the parser which affects also non-DDL queries. How about:

CREATE TEMPORARY SYSTEM FUNCTION xxx

The SYSTEM keyword is already a reserved keyword and in FLIP-66 we are discussing to prefix some of the function with a SYSTEM_ prefix like SYSTEM_WATERMARK. Also SQL defines syntax like "FOR SYSTEM_TIME AS OF".

What do you think?

Thanks,
Timo


On 20.09.19 05:45, Bowen Li wrote:
Another reason I prefer "CREATE TEMPORARY BUILTIN FUNCTION" over "ALTER
BUILTIN FUNCTION xxx TEMPORARILY" is - what if users want to drop the
temporary built-in function in the same session? With the former one, they
can run something like "DROP TEMPORARY BUILTIN FUNCTION"; With the latter
one, I'm not sure how users can "restore" the original builtin function
easily from an "altered" function without introducing further nonstandard
SQL syntax.

Also please pardon me as I realized using net may not be a good idea... I'm
trying to fit this vote into cases listed in Flink Bylaw [1].

>From the following result, the majority seems to be #2 too as it has the
most approval so far and doesn't have strong "-1".

#1:3 (+1), 1 (0), 4(-1)
#2:4(0), 3 (+1), 1(+0.5)
        * Dawid -1/0 depending on keyword
#3:2(+1), 3(-1), 3(0)

[1]
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=120731026

On Thu, Sep 19, 2019 at 10:30 AM Bowen Li <bowenl...@gmail.com> wrote:

Hi,

Thanks everyone for your votes. I summarized the result as following:

#1:3 (+1), 1 (0), 4(-1)     - net: -1
#2:4(0), 2 (+1), 1(+0.5)  - net: +2.5
         Dawid -1/0 depending on keyword
#3:2(+1), 3(-1), 3(0)       - net: -1

Given the result, I'd like to change my vote for #2 from 0 to +1, to make
it a stronger case with net +3.5. So the votes so far are:

#1:3 (+1), 1 (0), 4(-1)     - net: -1
#2:4(0), 3 (+1), 1(+0.5)  - net: +3.5
         Dawid -1/0 depending on keyword
#3:2(+1), 3(-1), 3(0)       - net: -1

What do you think? Do you think we can conclude with this result? Or would
you like to take it as a formal FLIP vote with 3 days voting period?

BTW, I'd prefer "CREATE TEMPORARY BUILTIN FUNCTION" over "ALTER BUILTIN
FUNCTION xxx TEMPORARILY" because
1. the syntax is more consistent with "CREATE FUNCTION" and "CREATE
TEMPORARY FUNCTION"
2. "ALTER BUILTIN FUNCTION xxx TEMPORARILY" implies it alters a built-in
function but it actually doesn't, the logic only creates a temp function
with higher priority than that built-in function in ambiguous resolution
order; and it would behave inconsistently with "ALTER FUNCTION".



On Thu, Sep 19, 2019 at 2:58 AM Fabian Hueske <fhue...@gmail.com> wrote:

I agree, it's very similar from the implementation point of view and the
implications.

IMO, the difference is mostly on the mental model for the user.
Instead of having a special class of temporary functions that have
precedence over builtin functions it suggests to temporarily change
built-in functions.

Fabian

Am Do., 19. Sept. 2019 um 11:52 Uhr schrieb Kurt Young <ykt...@gmail.com
:
Hi Fabian,

I think it's almost the same with #2 with different keyword:

CREATE TEMPORARY BUILTIN FUNCTION xxx

Best,
Kurt


On Thu, Sep 19, 2019 at 5:50 PM Fabian Hueske <fhue...@gmail.com>
wrote:
Hi,

I thought about it a bit more and think that there is some good value
in
my
last proposal.

A lot of complexity comes from the fact that we want to allow
overriding
built-in functions which are differently addressed as other functions
(and
db objects).
We could just have "CREATE TEMPORARY FUNCTION" do exactly the same
thing
as
"CREATE FUNCTION" and treat both functions exactly the same except
that:
1) temp functions disappear at the end of the session
2) temp function are resolved before other functions

This would be Dawid's proposal from the beginning of this thread (in
case
you still remember... ;-) )

Temporarily overriding built-in functions would be supported with an
explicit command like

ALTER BUILTIN FUNCTION xxx TEMPORARILY AS ...

This would also address the concerns about accidentally changing the
semantics of built-in functions.
IMO, it can't get much more explicit than the above command.

Sorry for bringing up a new option in the middle of the discussion,
but
as
I said, I think it has a bunch of benefits and I don't see major
drawbacks
(maybe you do?).

What do you think?

Fabian

Am Do., 19. Sept. 2019 um 11:24 Uhr schrieb Fabian Hueske <
fhue...@gmail.com
:
Hi everyone,

I thought again about option #1 and something that I don't like is
that
the resolved address of xyz is different in "CREATE FUNCTION xyz"
and
"CREATE TEMPORARY FUNCTION xyz".
IMO, adding the keyword "TEMPORARY" should only change the
lifecycle of
the function, but not where it is located. This implicitly changed
location
might be confusing for users.
After all, a temp function should behave pretty much like any other
function, except for the fact that it disappears when the session is
closed.
Approach #2 with the additional keyword would make that pretty
clear,
IMO.
However, I neither like GLOBAL (for reasons mentioned by Dawid) or
BUILDIN
(we are not adding a built-in function).
So I'd be OK with #2 if we find a good keyword. In fact, approach #2
could
also be an alias for approach #3 to avoid explicit specification of
the
system catalog/db.

Approach #3 would be consistent with other db objects and the
"CREATE
FUNCTION" statement.
Adding system catalog/db seems rather complex, but then again how
often
do
we expect users to override built-in functions? If this becomes a
major
issue, we can still add option #2 as an alias.

Not sure what's the best approach from an internal point of view,
but I
certainly think that consistent behavior is important.
Hence my votes are:

-1 for #1
0 for #2
0 for #3

Btw. Did we consider a completely separate command for overriding
built-in
functions like "ALTER BUILTIN FUNCTION xxx TEMPORARILY AS ..."?

Cheers, Fabian


Am Do., 19. Sept. 2019 um 11:03 Uhr schrieb JingsongLee
<lzljs3620...@aliyun.com.invalid>:

I know Hive and Spark can shadow built-in functions by temporary
function.
Mysql, Oracle, Sql server can not shadow.
User can use full names to access functions instead of shadowing.

So I think it is a completely new thing, and the direct way to deal
with
new things is to add new grammar. So,
+1 for #2, +0 for #3, -1 for #1

Best,
Jingsong Lee


------------------------------------------------------------------
From:Kurt Young <ykt...@gmail.com>
Send Time:2019年9月19日(星期四) 16:43
To:dev <dev@flink.apache.org>
Subject:Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

And let me make my vote complete:

-1 for #1
+1 for #2 with different keyword
-0 for #3

Best,
Kurt


On Thu, Sep 19, 2019 at 4:40 PM Kurt Young <ykt...@gmail.com>
wrote:
Looks like I'm the only person who is willing to +1 to #2 for now
:-)
But I would suggest to change the keyword from GLOBAL to
something like BUILTIN.

I think #2 and #3 are almost the same proposal, just with
different
format to indicate whether it want to override built-in
functions.
My biggest reason to choose it is I want this behavior be
consistent
with temporal tables. I will give some examples to show the
behavior
and also make sure I'm not misunderstanding anything here.

For most DBs, when user create a temporary table with:

CREATE TEMPORARY TABLE t1

It's actually equivalent with:

CREATE TEMPORARY TABLE `curent_db`.t1

If user change current database, they will not be able to access
t1
without
fully qualified name, .i.e db1.t1 (assuming db1 is current
database
when
this temporary table is created).

Only #2 and #3 followed this behavior and I would vote for this
since
this
makes such behavior consistent through temporal tables and
functions.
Why I'm not voting for #3 is a special catalog and database just
looks
very
hacky to me. It gave a imply that our built-in functions saved
at a
special
catalog and database, which is actually not. Introducing a
dedicated
keyword
like CREATE TEMPORARY BUILTIN FUNCTION looks more clear and
straightforward. One can argue that we should avoid introducing
new
keyword,
but it's also very rare that a system can overwrite built-in
functions.
Since we
decided to support this, introduce a new keyword is not a big
deal
IMO.
Best,
Kurt


On Thu, Sep 19, 2019 at 3:07 PM Piotr Nowojski <
pi...@ververica.com
wrote:

Hi,

It is a quite long discussion to follow and I hope I didn’t
misunderstand
anything. From the proposals presented by Xuefu I would vote:

-1 for #1 and #2
+1 for #3

Besides #3 being IMO more general and more consistent, having
qualified
names (#3) would help/make easier for someone to use cross
databases/catalogs queries (joining multiple data sets/streams).
For
example with some functions to manipulate/clean up/convert the
stored
data
in different catalogs registered in the respective catalogs.

Piotrek

On 19 Sep 2019, at 06:35, Jark Wu <imj...@gmail.com> wrote:

I agree with Xuefu that inconsistent handling with all the
other
objects is
not a big problem.

Regarding to option#3, the special "system.system" namespace
may
confuse
users.
Users need to know the set of built-in function names to know
when
to
use
"system.system" namespace.
What will happen if user registers a non-builtin function name
under
the
"system.system" namespace?
Besides, I think it doesn't solve the "explode" problem I
mentioned
at
the
beginning of this thread.

So here is my vote:

+1 for #1
0 for #2
-1 for #3

Best,
Jark


On Thu, 19 Sep 2019 at 08:38, Xuefu Z <usxu...@gmail.com>
wrote:
@Dawid, Re: we also don't need additional referencing the
specialcatalog
anywhere.

True. But once we allow such reference, then user can do so
in
any
possible
place where a function name is expected, for which we have to
handle.
That's a big difference, I think.

Thanks,
Xuefu

On Wed, Sep 18, 2019 at 5:25 PM Dawid Wysakowicz <
wysakowicz.da...@gmail.com>
wrote:

@Bowen I am not suggesting introducing additional catalog. I
think
we
need
to get rid of the current built-in catalog.

@Xuefu in option #3 we also don't need additional
referencing
the
special
catalog anywhere else besides in the CREATE statement. The
resolution
behaviour is exactly the same in both options.

On Thu, 19 Sep 2019, 08:17 Xuefu Z, <usxu...@gmail.com>
wrote:
Hi Dawid,

"GLOBAL" is a temporary keyword that was given to the
approach.
It
can
be
changed to something else for better.

The difference between this and the #3 approach is that we
only
need
the
keyword for this create DDL. For other places (such as
function
referencing), no keyword or special namespace is needed.

Thanks,
Xuefu

On Wed, Sep 18, 2019 at 4:32 PM Dawid Wysakowicz <
wysakowicz.da...@gmail.com>
wrote:

Hi,
I think it makes sense to start voting at this point.

Option 1: Only 1-part identifiers
PROS:
- allows shadowing built-in functions
CONS:
- incosistent with all the other objects, both permanent &
temporary
- does not allow shadowing catalog functions

Option 2: Special keyword for built-in function
I think this is quite similar to the special catalog/db.
The
thing I
am
strongly against in this proposal is the GLOBAL keyword.
This
keyword
has a
meaning in rdbms systems and means a function that is
present
for a
lifetime of a session in which it was created, but
available
in
all
other
sessions. Therefore I really don't want to use this
keyword
in
a
different
context.

Option 3: Special catalog/db

PROS:
- allows shadowing built-in functions
- allows shadowing catalog functions
- consistent with other objects
CONS:
- we introduce a special namespace for built-in functions

I don't see a problem with introducing the special
namespace.
In
the
end
it
is very similar to the keyword approach. In this case the
catalog/db
combination would be the "keyword"

Therefore my votes:
Option 1: -0
Option 2: -1 (I might change to +0 if we can come up with
a
better
keyword)
Option 3: +1

Best,
Dawid


On Thu, 19 Sep 2019, 05:12 Xuefu Z, <usxu...@gmail.com>
wrote:
Hi Aljoscha,

Thanks for the summary and these are great questions to
be
answered.
The
answer to your first question is clear: there is a
general
agreement
to
override built-in functions with temp functions.

However, your second and third questions are sort of
related,
as a
function
reference can be either just function name (like "func")
or
in
the
form
or
"cat.db.func". When a reference is just function name, it
can
mean
either a
built-in function or a function defined in the current
cat/db.
If
we
support overriding a built-in function with a temp
function,
such
overriding can also cover a function in the current
cat/db.
I think what Timo referred as "overriding a catalog
function"
means a
temp
function defined as "cat.db.func" overrides a catalog
function
"func"
in
cat/db even if cat/db is not current. To support this,
temp
function
has
to
be tied to a cat/db. What's why I said above that the 2nd
and
3rd
questions
are related. The problem with such support is the
ambiguity
when
user
defines a function w/o namespace, "CREATE TEMPORARY
FUNCTION
func
...".
Here "func" can means a global temp function, or a temp
function in
current
cat/db. If we can assume the former, this creates an
inconsistency
because
"CREATE FUNCTION func" actually means a function in
current
cat/db.
If
we
assume the latter, then there is no way for user to
create a
global
temp
function.

Giving a special namespace for built-in functions may
solve
the
ambiguity
problem above, but it also introduces artificial
catalog/database
that
needs special treatment and pollutes the cleanness of
the
code. I
would
rather introduce a syntax in DDL to solve the problem,
like
"CREATE
[GLOBAL] TEMPORARY FUNCTION func".

Thus, I'd like to summarize a few candidate proposals for
voting
purposes:
1. Support only global, temporary functions without
namespace.
Such
temp
functions overrides built-in functions and catalog
functions
in
current
cat/db. The resolution order is: temp functions ->
built-in
functions
->
catalog functions. (Partially or fully qualified
functions
has
no
ambiguity!)

2. In addition to #1, support creating and referencing
temporary
functions
associated with a cat/db with "GLOBAL" qualifier in DDL
for
global
temp
functions. The resolution order is: global temp
functions ->
built-in
functions -> temp functions in current cat/db -> catalog
function.
(Resolution for partially or fully qualified function
reference
is:
temp
functions -> persistent functions.)

3. In addition to #1, support creating and referencing
temporary
functions
associated with a cat/db with a special namespace for
built-in
functions
and global temp functions. The resolution is the same as
#2,
except
that
the special namespace might be prefixed to a reference
to a
built-in
function or global temp function. (In absence of the
special
namespace,
the
resolution order is the same as in #2.)

My personal preference is #1, given the unknown use case
and
introduced
complexity for #2 and #3. However, #2 is an acceptable
alternative.
Thus,
my votes are:

+1 for #1
+0 for #2
-1 for #3

Everyone, please cast your vote (in above format
please!),
or
let
me
know
if you have more questions or other candidates.

Thanks,
Xuefu







On Wed, Sep 18, 2019 at 6:42 AM Aljoscha Krettek <
aljos...@apache.org>
wrote:

Hi,

I think this discussion and the one for FLIP-64 are very
connected.
To
resolve the differences, think we have to think about
the
basic
principles
and find consensus there. The basic questions I see are:

- Do we want to support overriding builtin functions?
- Do we want to support overriding catalog functions?
- And then later: should temporary functions be tied to
a
catalog/database?

I don’t have much to say about these, except that we
should
somewhat
stick
to what the industry does. But I also understand that
the
industry
is
already very divided on this.

Best,
Aljoscha

On 18. Sep 2019, at 11:41, Jark Wu <imj...@gmail.com>
wrote:
Hi,

+1 to strive for reaching consensus on the remaining
topics.
We
are
close to the truth. It will waste a lot of time if we
resume
the
topic
some
time later.
+1 to “1-part/override” and I’m also fine with Timo’s
“cat.db.fun”
way
to override a catalog function.
I’m not sure about “system.system.fun”, it introduces a
nonexistent
cat
& db? And we still need to do special treatment for the
dedicated
system.system cat & db?
Best,
Jark


在 2019年9月18日,06:54,Timo Walther <twal...@apache.org>
写道:
Hi everyone,

@Xuefu: I would like to avoid adding too many things
incrementally.
Users should be able to override all catalog objects
consistently
according
to FLIP-64 (Support for Temporary Objects in Table
module).
If
functions
are treated completely different, we need more code and
special
cases.
From
an implementation perspective, this topic only affects
the
lookup
logic
which is rather low implementation effort which is why I
would
like
to
clarify the remaining items. As you said, we have a
slight
consenus
on
overriding built-in functions; we should also strive for
reaching
consensus
on the remaining topics.
@Dawid: I like your idea as it ensures registering
catalog
objects
consistent and the overriding of built-in functions more
explicit.
Thanks,
Timo


On 17.09.19 11:59, kai wang wrote:
hi, everyone
I think this flip is very meaningful. it supports
functions
that
can
be
shared by different catalogs and dbs, reducing the
duplication
of
functions.
Our group based on flink's sql parser module
implements
create
function
feature, stores the parsed function metadata and
schema
into
mysql,
and
also customizes the catalog, customizes sql-client to
support
custom
schemas and functions. Loaded, but the function is
currently
global,
and is
not subdivided according to catalog and db.

In addition, I very much hope to participate in the
development
of
this
flip, I have been paying attention to the community,
but
found
it
is
more
difficult to join.
thank you.

Xuefu Z <usxu...@gmail.com> 于2019年9月17日周二 上午11:19写道:

Thanks to Tmo and Dawid for sharing thoughts.

It seems to me that there is a general consensus on
having
temp
functions
that have no namespaces and overwrite built-in
functions.
(As
a
side
note
for comparability, the current user defined
functions
are
all
temporary and
having no namespaces.)

Nevertheless, I can also see the merit of having
namespaced
temp
functions
that can overwrite functions defined in a specific
cat/db.
However,
this
idea appears orthogonal to the former and can be
added
incrementally.
How about we first implement non-namespaced temp
functions
now
and
leave
the door open for namespaced ones for later
releases as
the
requirement
might become more crystal? This also helps shorten
the
debate
and
allow us
to make some progress along this direction.

As to Dawid's idea of having a dedicated cat/db to
host
the
temporary
temp
functions that don't have namespaces, my only
concern
is
the
special
treatment for a cat/db, which makes code less
clean, as
evident
in
treating
the built-in catalog currently.

Thanks,
Xuefiu

On Mon, Sep 16, 2019 at 5:07 PM Dawid Wysakowicz <
wysakowicz.da...@gmail.com>
wrote:

Hi,
Another idea to consider on top of Timo's
suggestion.
How
about
we
have a
special namespace (catalog + database) for built-in
objects?
This
catalog
would be invisible for users as Xuefu was
suggesting.
Then users could still override built-in
functions, if
they
fully
qualify
object with the built-in namespace, but by default
the
common
logic
of
current dB & cat would be used.

CREATE TEMPORARY FUNCTION func ...
registers temporary function in current cat & dB

CREATE TEMPORARY FUNCTION cat.db.func ...
registers temporary function in cat db

CREATE TEMPORARY FUNCTION system.system.func ...
Overrides built-in function with temporary function

The built-in/system namespace would not be writable
for
permanent
objects.
WDYT?

This way I think we can have benefits of both
solutions.
Best,
Dawid


On Tue, 17 Sep 2019, 07:24 Timo Walther, <
twal...@apache.org
wrote:
Hi Bowen,

I understand the potential benefit of overriding
certain
built-in
functions. I'm open to such a feature if many
people
agree.
However, it
would be great to still support overriding catalog
functions
with
temporary functions in order to prototype a query
even
though
a
catalog/database might not be available currently
or
should
not
be
modified yet. How about we support both cases?

CREATE TEMPORARY FUNCTION abs
-> creates/overrides a built-in function and never
consideres
current
catalog and database; inconsistent with other DDL
but
acceptable
for
functions I guess.
CREATE TEMPORARY FUNCTION cat.db.fun
-> creates/overrides a catalog function

Regarding "Flink don't have any other built-in
objects
(tables,
views)
except functions", this might change in the near
future.
Take
https://issues.apache.org/jira/browse/FLINK-13900
as
an
example.
Thanks,
Timo

On 14.09.19 01:40, Bowen Li wrote:
Hi Fabian,

Yes, I agree 1-part/no-override is the least
favorable
thus I
didn't
include that as a voting option, and the
discussion
is
mainly
between
1-part/override builtin and 3-part/not override
builtin.
Re > However, it means that temp functions are
differently
treated
than
other db objects.
IMO, the treatment difference results from the
fact
that
functions
are
a
bit different from other objects - Flink don't
have
any
other
built-in
objects (tables, views) except functions.

Cheers,
Bowen

--
Xuefu Zhang

"In Honey We Trust!"


--
Xuefu Zhang

"In Honey We Trust!"


--
Xuefu Zhang

"In Honey We Trust!"


--
Xuefu Zhang

"In Honey We Trust!"



Reply via email to