[
https://issues.apache.org/jira/browse/CALCITE-3760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17027507#comment-17027507
]
Jin Xing edited comment on CALCITE-3760 at 1/31/20 1:50 PM:
------------------------------------------------------------
Thanks a lot for your kind help [~julianhyde] ~
Actually in my production, we bypass this issue by asking our user to follow a
contract – – non-deterministic UDF/UDAF should always called independently and
not allowed to be nested in another SqlCall. We have legal and illegal cases
like below
{code:java}
-- legal
select coalesce(udf_col, 100) from
(select udf(col) udf_col from foo)
-- illegal
select coalesce(udf(col), 100) from foo{code}
{quote}So, maybe the best way is to use a Project on a Project
{quote}
I understand this comment as to add a Project for the non-deterministic
UDF/UDAF, thus to guarantee the number of times to be evaluated. I think it
will work correctly and keep the query semantics. But in current
implementation, the rewriting of SqlNode works only for SqlCall[1] during
SqlValidatorImpl#performUnconditionalRewrites. If a SqlFunction wants to
customize rewriting logic, it just defines how itself is transformed – – no
need to care where it locates. But If we want to rewrite and add the Project,
the customizing rewriting logic will need to touch the outside SqlSelect, which
I think will complicate the code.
Additionally, SqlValidatorImpl#performUnconditionalRewrites runs before nodes
are expanded and fully resolved. The rewriting might be hard. Think about below
example:
{code:java}
select *, coalesce(udf(col), 100) from foo
{code}
If we simply rewrite it as
{code:java}
select *, case when x is not null then x else 100 end
from (
select *, udf(c)
from foo)
{code}
The semantics will be changed.
I still propose to don't do rewriting when found non-deterministic, and for
deterministic operators the behavior is as before. AFAIK, the reason Calcite
rewrite functions like COALESCE and NULLIF is for better and simpler
implementation (NULLIF doesn't have a self implementation yet.). Is there other
reasons that I missed ?
Thanks again for your help !
[1][https://github.com/apache/calcite/blob/c416c31fc376868bdd672afd84ec06dc75d56575/core/src/main/java/org/apache/calcite/sql/SqlOperator.java#L316]
was (Author: [email protected]):
Thanks a lot for your kind help [~julianhyde] ~
Actually in my production, we bypass this issue by asking our user to follow a
contract – – non-deterministic UDF/UDAF should always called independently and
not allowed to be nested in another SqlCall. We have legal and illegal cases
like below
{code:java}
-- legal
select coalesce(udf_col, 100) from
(select udf(col) udf_col from foo)
-- illegal
select coalesce(udf(col), 100) from foo{code}
{quote}So, maybe the best way is to use a Project on a Project
{quote}
I understand this comment as to add a Project for the non-deterministic
UDF/UDAF, thus to guarantee the number of times to be evaluated. I think it
will work correctly and keep the query semantics. But in current
implementation, the rewriting of SqlNode works only for SqlCall[1] during
SqlValidatorImpl#performUnconditionalRewrites. If a SqlFunction wants to
customize rewriting logic, it just defines how itself is transformed – – no
need to care where it locates. But If we want to rewrite and add the Project,
the customizing rewriting logic will need to touch the outside SqlSelect, which
I think will complicate the code.
Additionally, SqlValidatorImpl#performUnconditionalRewrites runs before nodes
are expanded and fully resolved. The rewriting might be hard. Think about below
example:
{code:java}
select *, coalesce(udf(col), 100) from foo
{code}
If we simply rewrite it as
{code:java}
select *, case when x is not null then x else 100 end
from (
select *, udf(c)
from foo)
{code}
The semantics will be changed.
I still propose to don't do rewriting when found non-deterministic, and for
deterministic operators the behavior is as before. AFAIK, the reason Calcite
rewrite functions like COALESCE and NULLIF is for better and simpler
implementation (NULLIF doesn't have a self implementation yet.). Is there other
reasons that I missed ?
[1][https://github.com/apache/calcite/blob/c416c31fc376868bdd672afd84ec06dc75d56575/core/src/main/java/org/apache/calcite/sql/SqlOperator.java#L316]
> Rewriting non-deterministic function can break query semantics
> --------------------------------------------------------------
>
> Key: CALCITE-3760
> URL: https://issues.apache.org/jira/browse/CALCITE-3760
> Project: Calcite
> Issue Type: Bug
> Components: core
> Reporter: Jin Xing
> Assignee: Jin Xing
> Priority: Major
> Labels: pull-request-available
> Time Spent: 10m
> Remaining Estimate: 0h
>
> Calcite rewrite some *SqlFunctions* during validation. But whether the
> function is deterministic is not considered. For a non-deterministic
> operator, the rewriting can break semantics. Additionally there's no
> interface for user to specify the determinism for a UDF/UDAF.
> Say I have non-deterministic UDF & UDAF and run sql like below
> {code:java}
> select coalesce(udf(col0), 100) from foo;
> select nullif(udaf(col0), 1024) from foo;{code}
> They will be rewritten as
> {code:java}
> select case when udf(col0) is not null then udf(col0) else 100 end
> from foo;
> select case when udaf(col0)=1024 then null udaf(col0)
> from foo{code}
> As we can see that non-deterministic UDF & UDAF are called multiple times
> after written. Thus the condition in WHEN clause might NOT be held all the
> time.
> We need to provide an interface for user to specify the determinism in
> UDF/UDAF and consider whether a SqlNode is deterministic when rewriting.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)