[
https://issues.apache.org/jira/browse/CALCITE-4048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17128719#comment-17128719
]
Maytas Monsereenusorn commented on CALCITE-4048:
------------------------------------------------
[~julianhyde]
I don't think this has anything to do with the InSubQueryThreshold and hence,
setting InSubQueryThreshold property will not change anything.
The problem described in my ticket (CALCITE-4048) is due to having a NULL in
the IN clause. For example, if the IN clause has 3 items in it such as IN ('a',
'b', null) and the InSubQueryThreshold is set to > 3 (Max Int for example),
then this IN clause will still be converted to a JOIN (an inline table). In
fact, whatever InSubQueryThreshold you set does not matter and the mentioned IN
clause will always be converted to a JOIN (an inline table).
> Allow NULL in IN clause to be converted to OR
> ---------------------------------------------
>
> Key: CALCITE-4048
> URL: https://issues.apache.org/jira/browse/CALCITE-4048
> Project: Calcite
> Issue Type: Improvement
> Reporter: Maytas Monsereenusorn
> Priority: Minor
>
> Currently, Calcite convert a value list with NULL in the IN clause into an
> inline table. The code is in SqlToRelConverter, where we call
> !containsNullLiteral(valueList). For example, if I have a SQL select * from
> druid.foo where dim in ('a', null) then Calcite will not convert this to
> ...dim='a' OR dim=null but to a join on top of an aggregate. Furthermore,
> this can (I think?) be simplify to just dim='a' since dim=null is always
> null. (https://issues.apache.org/jira/browse/CALCITE-373 and
> [https://www.techrepublic.com/article/oracle-tip-understand-how-nulls-affect-in-and-exists/]
> -> _"When you use IN, you're telling SQL to take a value and compare it
> against every value or set of values in a list using =. If any NULL values
> exist, a row will not be returned__-even if both values are NULL."_ and
> [https://stackoverflow.com/questions/6362112/in-clause-with-null-or-is-null])
>
> The plan I got after converting SqlNode to RelNode for the above example:
> {code:java}
> LogicalProject(__time=[$0], cnt=[$1], dim1=[$2], dim2=[$3], dim3=[$4],
> m1=[$5], m2=[$6], unique_dim1=[$7])
> LogicalJoin(condition=[=($3, $8)], joinType=[inner])
> LogicalTableScan(table=[[druid, foo]])
> LogicalAggregate(group=[{0}])
> LogicalValues(tuples=[[{ 'a' }, { null }]]){code}
> ^ Note that: I believe current Calcite conversion from SqlNode to RelNode (as
> shown above) will ignore the null since it is only aggregating on 'a' and
> ignoring the null in LogicalValues.
> however, i expect:
> {code:java}
> LogicalProject(__time=[$0], cnt=[$1], dim1=[$2], dim2=[$3], dim3=[$4],
> m1=[$5], m2=[$6], unique_dim1=[$7])
> LogicalFilter(condition=[OR(=($3, 'a'), =($3, null))])
> LogicalTableScan(table=[[druid, foo]]){code}
> or something like:
> {code:java}
> LogicalProject(__time=[$0], cnt=[$1], dim1=[$2], dim2=[$3], dim3=[$4],
> m1=[$5], m2=[$6], unique_dim1=[$7])
> LogicalFilter(condition=[=($3, 'a')])
> LogicalTableScan(table=[[druid, foo]]){code}
> I believe this is not optimal on the conversion of SqlNode to RelNode by
> Calcite. While both are logically correct, having a Join on top of
> aggregation is not needed as we can simply use a filter as shown in the above
> examples. As suggested by Danny, line
> [https://github.com/apache/calcite/blob/feae6fbc328e3a7c87693951d1623f8b47ccea59/core/src/main/java/org/apache/calcite/sql2rel/SqlToRelConverter.java#L1102]
> can be promoted, we can move the null literals comparison to the last of
> composition predicates (OR/AND) instead of forbidden all the IN to OR(AND)
> conversion if the IN value list contains nulls.
>
> For reference:
> [https://mail-archives.apache.org/mod_mbox/calcite-dev/202006.mbox/%3C5ff77f79-097d-454e-8f0c-74170d99825f%40Spark%3E]
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)