[jira] [Comment Edited] (CALCITE-3760) Rewriting non-deterministic function can break query semantics

Jin Xing (Jira) Fri, 31 Jan 2020 05:51:06 -0800


    [ 
https://issues.apache.org/jira/browse/CALCITE-3760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17027507#comment-17027507
 ]


Jin Xing edited comment on CALCITE-3760 at 1/31/20 1:50 PM:
------------------------------------------------------------

Thanks a lot for your kind help [~julianhyde] ~

Actually in my production, we bypass this issue by asking our user to follow a 
contract – – non-deterministic UDF/UDAF should always called independently and 
not allowed to be nested in another SqlCall. We have legal and illegal cases 
like below 
{code:java}
-- legal
select coalesce(udf_col, 100) from 
  (select udf(col) udf_col from foo)

-- illegal
select coalesce(udf(col), 100) from foo{code}
{quote}So, maybe the best way is to use a Project on a Project
{quote}
I understand this comment as to add a Project for the non-deterministic 
UDF/UDAF, thus to guarantee the number of times to be evaluated. I think it 
will work correctly and keep the query semantics. But in current 
implementation, the rewriting of SqlNode works only for SqlCall[1] during 
SqlValidatorImpl#performUnconditionalRewrites. If a SqlFunction wants to 
customize rewriting logic, it just defines how itself is transformed – –  no 
need to care where it locates. But If we want to rewrite and add the Project, 
the customizing rewriting logic will need to touch the outside SqlSelect, which 
I think will complicate the code.

Additionally,  SqlValidatorImpl#performUnconditionalRewrites runs before nodes 
are expanded and fully resolved. The rewriting might be hard. Think about below 
example:
{code:java}
select *, coalesce(udf(col), 100) from foo
{code}
If we simply rewrite it as
{code:java}
select *, case when x is not null then x else 100 end
from (
  select *, udf(c)
  from foo)
{code}
The semantics will be changed.

I still propose to don't do rewriting when found non-deterministic, and for 
deterministic operators the behavior is as before. AFAIK, the reason Calcite 
rewrite functions like COALESCE and NULLIF is for better and simpler 
implementation (NULLIF doesn't have a self implementation yet.). Is there other 
reasons that I missed ?

Thanks again for your help !

 

[1][https://github.com/apache/calcite/blob/c416c31fc376868bdd672afd84ec06dc75d56575/core/src/main/java/org/apache/calcite/sql/SqlOperator.java#L316]
 

 


was (Author: [email protected]):
Thanks a lot for your kind help [~julianhyde] ~

Actually in my production, we bypass this issue by asking our user to follow a 
contract – – non-deterministic UDF/UDAF should always called independently and 
not allowed to be nested in another SqlCall. We have legal and illegal cases 
like below

 
{code:java}
-- legal
select coalesce(udf_col, 100) from 
  (select udf(col) udf_col from foo)

-- illegal
select coalesce(udf(col), 100) from foo{code}
 
{quote}So, maybe the best way is to use a Project on a Project
{quote}
I understand this comment as to add a Project for the non-deterministic 
UDF/UDAF, thus to guarantee the number of times to be evaluated. I think it 
will work correctly and keep the query semantics. But in current 
implementation, the rewriting of SqlNode works only for SqlCall[1] during 
SqlValidatorImpl#performUnconditionalRewrites. If a SqlFunction wants to 
customize rewriting logic, it just defines how itself is transformed – –  no 
need to care where it locates. But If we want to rewrite and add the Project, 
the customizing rewriting logic will need to touch the outside SqlSelect, which 
I think will complicate the code.

Additionally,  SqlValidatorImpl#performUnconditionalRewrites runs before nodes 
are expanded and fully resolved. The rewriting might be hard. Think about below 
example:
{code:java}
select *, coalesce(udf(col), 100) from foo
{code}
If we simply rewrite it as
{code:java}
select *, case when x is not null then x else 100 end
from (
  select *, udf(c)
  from foo)
{code}
The semantics will be changed.

I still propose to don't do rewriting when found non-deterministic, and for 
deterministic operators the behavior is as before. AFAIK, the reason Calcite 
rewrite functions like COALESCE and NULLIF is for better and simpler 
implementation (NULLIF doesn't have a self implementation yet.). Is there other 
reasons that I missed ?

[1][https://github.com/apache/calcite/blob/c416c31fc376868bdd672afd84ec06dc75d56575/core/src/main/java/org/apache/calcite/sql/SqlOperator.java#L316]
 

 

> Rewriting non-deterministic function can break query semantics
> --------------------------------------------------------------
>
>                 Key: CALCITE-3760
>                 URL: https://issues.apache.org/jira/browse/CALCITE-3760
>             Project: Calcite
>          Issue Type: Bug
>          Components: core
>            Reporter: Jin Xing
>            Assignee: Jin Xing
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Calcite rewrite some *SqlFunctions* during validation. But whether the 
> function is deterministic is not considered. For a non-deterministic 
> operator, the rewriting can break semantics. Additionally there's no 
> interface for user to specify the determinism for a UDF/UDAF. 
> Say I have non-deterministic UDF & UDAF and run sql like below
> {code:java}
> select coalesce(udf(col0), 100) from foo;
> select nullif(udaf(col0), 1024) from foo;{code}
> They will be rewritten as
> {code:java}
> select case when udf(col0) is not null then udf(col0) else 100 end
> from foo;
> select case when udaf(col0)=1024 then null udaf(col0)
> from foo{code}
> As we can see that non-deterministic UDF & UDAF are called multiple times 
> after written. Thus the condition in WHEN clause might NOT be held all the 
> time.
> We need to provide an interface for user to specify the determinism in 
> UDF/UDAF and consider whether a SqlNode is deterministic when rewriting.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (CALCITE-3760) Rewriting non-deterministic function can break query semantics

Reply via email to