[jira] [Commented] (CALCITE-4048) Allow NULL in IN clause to be converted to OR

Maytas Monsereenusorn (Jira) Mon, 08 Jun 2020 16:45:59 -0700


    [ 
https://issues.apache.org/jira/browse/CALCITE-4048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17128719#comment-17128719
 ]


Maytas Monsereenusorn commented on CALCITE-4048:
------------------------------------------------

[~julianhyde]
I don't think this has anything to do with the InSubQueryThreshold and hence, 
setting InSubQueryThreshold property will not change anything.
The problem described in my ticket (CALCITE-4048) is due to having a NULL in 
the IN clause. For example, if the IN clause has 3 items in it such as IN ('a', 
'b', null) and the InSubQueryThreshold is set to > 3 (Max Int for example), 
then this IN clause will still be converted to a JOIN (an inline table). In 
fact, whatever InSubQueryThreshold you set does not matter and the mentioned IN 
clause will always be converted to a JOIN (an inline table).

> Allow NULL in IN clause to be converted to OR
> ---------------------------------------------
>
>                 Key: CALCITE-4048
>                 URL: https://issues.apache.org/jira/browse/CALCITE-4048
>             Project: Calcite
>          Issue Type: Improvement
>            Reporter: Maytas Monsereenusorn
>            Priority: Minor
>
> Currently, Calcite convert a value list with NULL in the IN clause into an 
> inline table. The code is in SqlToRelConverter, where we call 
> !containsNullLiteral(valueList). For example, if I have a SQL select * from 
> druid.foo where dim in ('a', null) then Calcite will not convert this to 
> ...dim='a' OR dim=null but to a join on top of an aggregate. Furthermore, 
> this can (I think?) be simplify to just dim='a' since dim=null is always 
> null. (https://issues.apache.org/jira/browse/CALCITE-373 and 
> [https://www.techrepublic.com/article/oracle-tip-understand-how-nulls-affect-in-and-exists/]
>  -> _"When you use IN, you're telling SQL to take a value and compare it 
> against every value or set of values in a list using =. If any NULL values 
> exist, a row will not be returned__-even if both values are NULL."_ and 
> [https://stackoverflow.com/questions/6362112/in-clause-with-null-or-is-null]) 
>  
>  The plan I got after converting SqlNode to RelNode for the above example:
> {code:java}
> LogicalProject(__time=[$0], cnt=[$1], dim1=[$2], dim2=[$3], dim3=[$4], 
> m1=[$5], m2=[$6], unique_dim1=[$7])
>   LogicalJoin(condition=[=($3, $8)], joinType=[inner])
>     LogicalTableScan(table=[[druid, foo]])
>     LogicalAggregate(group=[{0}])
>       LogicalValues(tuples=[[{ 'a' }, { null }]]){code}
> ^ Note that: I believe current Calcite conversion from SqlNode to RelNode (as 
> shown above) will ignore the null since it is only aggregating on 'a' and 
> ignoring the null in LogicalValues. 
> however, i expect:
> {code:java}
> LogicalProject(__time=[$0], cnt=[$1], dim1=[$2], dim2=[$3], dim3=[$4], 
> m1=[$5], m2=[$6], unique_dim1=[$7])
>   LogicalFilter(condition=[OR(=($3, 'a'), =($3, null))])
>     LogicalTableScan(table=[[druid, foo]]){code}
> or something like:
> {code:java}
> LogicalProject(__time=[$0], cnt=[$1], dim1=[$2], dim2=[$3], dim3=[$4], 
> m1=[$5], m2=[$6], unique_dim1=[$7])
>   LogicalFilter(condition=[=($3, 'a')])
>     LogicalTableScan(table=[[druid, foo]]){code}
> I believe this is not optimal on the conversion of SqlNode to RelNode by 
> Calcite. While both are logically correct, having a Join on top of 
> aggregation is not needed as we can simply use a filter as shown in the above 
> examples. As suggested by Danny, line 
> [https://github.com/apache/calcite/blob/feae6fbc328e3a7c87693951d1623f8b47ccea59/core/src/main/java/org/apache/calcite/sql2rel/SqlToRelConverter.java#L1102]
>  can be promoted, we can move the null literals comparison to the last of 
> composition predicates (OR/AND) instead of forbidden all the IN to OR(AND) 
> conversion if the IN value list contains nulls.
>  
> For reference: 
> [https://mail-archives.apache.org/mod_mbox/calcite-dev/202006.mbox/%3C5ff77f79-097d-454e-8f0c-74170d99825f%40Spark%3E]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (CALCITE-4048) Allow NULL in IN clause to be converted to OR

Reply via email to