[jira] [Comment Edited] (CALCITE-7232) Restore use of IN operator in RexCall

Julian Hyde (Jira) Sun, 19 Oct 2025 13:10:06 -0700


    [ 
https://issues.apache.org/jira/browse/CALCITE-7232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18030992#comment-18030992
 ]


Julian Hyde edited comment on CALCITE-7232 at 10/19/25 8:09 PM:
----------------------------------------------------------------

{quote}The main point that I would like to tackle here is about the IN 
(possibly BETWEEN) and other operators being present in the RexNode tree{quote}

No, broadening the language is a big deal, so I think we need to simultaneously 
figure out how to control the places in which the broader language is used.

If we add the suggested "isSimplified" flag, the next question will be 
"simplified according to which set of simplification rules?" Is it in CNF 
(conjunctive normal form) or DNF or some other normal form? The "isSimplified" 
flag quickly turns from a boolean to a bitmap.

And if we allow RexNode trees to be mutable, we now have to worry about whether 
a node is CNF when it is created but ceases to be in CNF when its children are 
updated. Mutability is a very bad idea.

It's very easy to opine in Jira cases and email threads how convenient it would 
be to loosen this or that restriction but the consequences of a poor decision 
are subtle and play out over years. So let's be conservative here.

I'll throw out a suggestion for how we can be more rigid: let's split Rex into 
two languages, LooseRex and PlanningRex. They have an isomorphic class 
structure, but PlanningRex has a more limited set of operators. There's a 
trivial copy from one to the other (with some rewrites going from LooseRex to 
PlanningRex to deal with banned operators). In the past, SqlNode has played the 
role of LooseRex, but people for some reason (it's not actually clear) want to 
represent IN in a Rex tree. If that 'left shift' makes Rex a worse language for 
query planning then I'm against it.


was (Author: julianhyde):
{quote}The main point that I would like to tackle here is about the IN 
(possibly BETWEEN) and other operators being present in the RexNode tree{quote}

No, broadening the language is a big deal, so I think we need to simultaneously 
figure out how to control the places in which the broader language is used.

If we add the suggested "isSimplified" flag, the next question will be 
"simplified according to which set of simplification rules?" Is it in CNF 
(conjunctive normal form) or DNF or some other normal form? The "isSimplified" 
flag quickly turns from a boolean to a bitmap.

And if we allow RexNode trees to be mutable, we now have to worry about whether 
a node is CNF when it is created and ceases. Mutability is a very bad idea.

It's very easy to opine in Jira cases and email threads how convenient it would 
be to loosen this or that restriction but the consequences of a poor decision 
are subtle and play out over years. So let's be conservative here.

I'll throw out a suggestion for how we can be more rigid: let's split Rex into 
two languages, LooseRex and PlanningRex. They have an isomorphic class 
structure, but PlanningRex has a more limited set of operators. There's a 
trivial copy from one to the other (with some rewrites going from LooseRex to 
PlanningRex to deal with banned operators). In the past, SqlNode has played the 
role of LooseRex, but people for some reason (it's not actually clear) want to 
represent IN in a Rex tree. If that 'left shift' makes Rex a worse language for 
query planning then I'm against it.

> Restore use of IN operator in RexCall
> -------------------------------------
>
>                 Key: CALCITE-7232
>                 URL: https://issues.apache.org/jira/browse/CALCITE-7232
>             Project: Calcite
>          Issue Type: Task
>            Reporter: Stamatis Zampetakis
>            Priority: Major
>
> The use of {{IN}} operator in {{RexCall}} was superseded by the introduction 
> of the {{SEARCH}} operator (CALCITE-4173) and its use is strictly forbidden 
> through 
> [assertions|https://github.com/apache/calcite/blob/6cbbf560b721cb88354c33751aa72b16a58ded23/core/src/main/java/org/apache/calcite/rex/RexCall.java#L94].
>  The {{SEARCH}} operator is more general and powerful than {{IN}} so it's a 
> perfect abstraction to use during the optimization phase.
> However, most databases don't have a {{SEARCH}} operator so the latter needs 
> to be transformed back to {{IN}} (or something else) at some point in time. 
> For instance, Apache Hive has two ways of generating an executable plan:
>  * take a {{RelNode}} and generate an AST tree
>  * take a {{RelNode}} and generate a Hive Operator tree
> both of which are eventually going to be executed.
> *If we don't allow* IN in a RexCall, then it means that we need to create 
> special code to handle SEARCH in both code paths that differ only slightly in 
> each case. (In reality the situation is more complicated for Hive because 
> there are at least two more places where we need to do a SEARCH to IN 
> transformation).
> *If we allow IN* in a RexCall, then at the end of the RelNode optimization 
> phase we can "expand" {{SEARCH}} to {{IN}} so the transformation logic only 
> appears in one place and it remains a {{RelNode}} to {{RelNode}} conversion. 
> In fact, the same transformation logic could be exploited in 
> [SqlImplementor|https://github.com/apache/calcite/blob/6cbbf560b721cb88354c33751aa72b16a58ded23/core/src/main/java/org/apache/calcite/rel/rel2sql/SqlImplementor.java#L815]
>  that does another {{RelNode}} to "something" conversion.
> The obvious downside with this proposal is that if people start mixing the IN 
> operator in various optimization rules/phases it can certainly affect the 
> quality of the plans and the planning time.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (CALCITE-7232) Restore use of IN operator in RexCall

Reply via email to