I think we're all in agreement then! I'm all for banning IN and BETWEEN during planning. Not sure if your position is this extreme, but if we can ban "=", ">", "<", and others, I wouldn't say no (though I wouldn't demand it either).
Gonna spitball an idea off the top of my head that might be too big to develop. But it sounds like there's a concept of internal planning operators and external database operators. While plenty of these overlap, there seems to be a good reason to separate the two concepts. Maybe it could be as simple as having certain operators implement a dummy interface that could be cast? Idk, just thinking out loud. On Thu, Oct 16, 2025 at 11:02 AM Julian Hyde <[email protected]> wrote: > My position is that during planning, IN and BETWEEN should not exist. All > such ranges should be represented by SEARCH. > > In 7232 Stamatis points out that it would be useful to have an IN RexCall > if we’re just about to generate an IN clause for some particular database. > I agree with that position, but we have to find a practical and efficient > way to ban IN from the planning process. > > Julian > > > > On Oct 16, 2025, at 8:02 AM, Stamatis Zampetakis <[email protected]> > wrote: > > > > The discussion became more about having IN in a RexCall so I logged > > CALCITE-7232 [1] sharing also some background context about challenges > > in Apache Hive. We can continue the discussion there and see if we can > > reach consensus. > > > > [1] https://issues.apache.org/jira/browse/CALCITE-7232 > > > > On Wed, Oct 15, 2025 at 5:55 PM Stephen Carlin > > <[email protected]> wrote: > >> > >> This will perhaps be a reiteration of what I said in my previous email, > but > >> here are the major highlights, I think > >> > >> - SEARCH is something that makes rules more efficient. A very nice > >> structure indeed and very handy. > >> - I'm guessing the RexUtil.expandSearch() method was probably created > >> because many databases cannot deal with SEARCH directly. So it gets > >> expanded to existing RexCall operators to standard SQL operations that > all > >> databases understand. I hope I am getting this right. A very > convenient > >> and nice method for the databases that don't implement SEARCH, I > imagine. > >> - The expandSearch method can really only have one type of expansion > >> (config parameters notwithstanding). So while many SQL functions can > map > >> into 1 SEARCH operator, only one of those creations can be chosen when > >> expanding. > >> - IN is the only cases I am aware of (are there others?) where, at > >> SqlToRelConverter time, the O(1) function concept is just thrown away. > It > >> either creates a subquery RelNode tree, losing the concept of a pure > >> function or it changes it into an inefficient O(m) function with > multiple > >> ORs. *This is actually my main issue*, not the SEARCH operator. > >> - I would imagine pretty much all databases understand a concept of an > IN > >> function as it is part of the SQL standard, I think? Yes, IN is > >> overloaded, but this specialized use of it is very common, common enough > >> whereI would imagine the non-SEARCH databases prolly implement this as a > >> function with an O(1) search time. > >> > >> All these points together point to the missing functionality for me. > While > >> CALCITE-7226 slightly addresses this issue, it really doesn't solve the > >> main problem for me in that the expandSearch generates inefficient SQL > for > >> my database. The RexUtil.expandSearch method is essentially unusable > and I > >> have to resort to writing my own expandSearch. Or I have to write my > own > >> "OR -> IN' detection and convert it into my custom IN operator. > >> > >> On Wed, Oct 15, 2025 at 6:11 AM Stephen Carlin <[email protected]> > wrote: > >> > >>> What you said makes perfect sense to me about restricting the > functions. > >>> Definitely think it was a good idea to create it and implement it. > >>> > >>> But I do think you are slightly off about where the tension lies. I > think > >>> the tension is on a very specific issue for the most part, at least > for me. > >>> > >>> If I'm understanding what you are saying about the tension correctly: I > >>> think your NVL example is there to show that you can't really guess > which > >>> way to do the expand search? And this is true. However, the > difference > >>> between NVL and a case statement is prolly not something I care much > >>> about. There are only 2 parameters in an NVL function. If you give > me a > >>> CASE when I originally had an NVL my first reaction would be, "Where > the > >>> heck did that come from?" followed by "meh, they do the same thing > and the > >>> efficiency isn't much different, the end user can figure out that > they're > >>> the same thing, I'll get around to fixing that later, if ever." > >>> > >>> But the IN case is far, far different. It is expanded as "A = 1 OR A > = 2 > >>> OR A =3 OR ...." and 100 more ORs which is a huge headache. 100 OR > >>> statements is quite the nightmare compared to 1 IN statement. I really > >>> need that expansion to an IN operator. > >>> > >>> Not sure how many operators fit the IN model as opposed to the NVL > model, > >>> but I'd imagine it's not a lot. I'm guessing the main complaints came > >>> about IN. > >>> > >>> On Tue, Oct 14, 2025 at 4:17 PM Julian Hyde <[email protected]> > >>> wrote: > >>> > >>>> SEARCH is a semi-join too. It’s just a semi-join to an infinite set > (the > >>>> points on the number line, or any totally-ordered type). :) > >>>> > >>>> I agree that it’s useful to have multiple functions, and mappings > between > >>>> them. We certainly allow multiple functions in the SQL we consume, > and the > >>>> SQL we produce (via the JDBC adapter). And we map between them by > >>>> desugaring (e.g. "a BETWEEN b AND c" ==> "a >= b AND a <= c”.) > >>>> > >>>> It is very advantageous to restrict the set of functions that can be > used > >>>> in planner rules and simplifications. Since SEARCH can represent > BETWEEN > >>>> (on a constant range) and >, the argument goes, let’s convert them > both to > >>>> SEARCH, and our rules can be simpler. > >>>> > >>>> I am a fan of this approach. It is much harder to produce smart, > >>>> consistent, robust planning rules if we let the surface area get too > large. > >>>> > >>>> This approach is at tension with people who want and expect their > >>>> operators to pass through planning unchanged. If someone wrote > "NVL(a, b)" > >>>> they are surprised if it comes out as "CASE WHEN a IS NULL THEN b > ELSE a”. > >>>> I guess we can mitigate by re-sugaring on the backend, before we > generate > >>>> SQL. > >>>> > >>>> Julian > >>>> > >>>> > >>>> > >>>> > >>>>> On Oct 14, 2025, at 1:51 PM, Stephen Carlin > >>>> <[email protected]> wrote: > >>>>> > >>>>> Heh, it's funny you say that because Stamatis filed this improvement > >>>>> request in a meeting where I was complaining about things being more > >>>>> complicated and having broken code because of SEARCH :) > >>>>> > >>>>> My main complaint is this, and perhaps it makes sense to file another > >>>> Jira > >>>>> for a potential new improvement: > >>>>> > >>>>> I totally get SEARCH and its usefulness. So I have no complaints > about > >>>> the > >>>>> existence of SEARCH, why it is needed, and why it makes sense. > >>>>> > >>>>> My problem with SEARCH is simply that it makes my code more > complex. I > >>>>> kinda think of it this way: The SEARCH operator is a superset of the > >>>>> BETWEEN operator. And we either are ok with having a representation > for > >>>>> both or we are stuck with it due to legacy reasons. I prefer the > former > >>>>> explanation, heh. So why not have an IN_FUNC operator as well? No > >>>> need to > >>>>> get rid of SEARCH! Just keep it as a superset. And then create a > >>>> RexUtil > >>>>> method that does the work for me that converts a relevant SEARCH to > an > >>>>> IN_FUNC. > >>>>> > >>>>> Perhaps the general issue with IN is this: Like SEARCH, IN also has a > >>>>> broader definition than just a function, but the broader definition > is > >>>>> different from SEARCH. In our internal discussion, we talked about > how > >>>>> "IN" is always a semi-join of some nature, sometimes of course with a > >>>>> subquery. I believe this is how Calcite treats IN. But in > practice, IN > >>>>> has a specialized case as something that fits right into what a > >>>>> RexCall/function does. I think many databases (at least the two I > work > >>>> on) > >>>>> implement this specialized case of IN as a function internally, just > >>>> like > >>>>> BETWEEN. So to have a 1:1 RexCall operator to function for IN just > >>>> seems > >>>>> to make intuitive sense for me. > >>>>> > >>>>> Perhaps we don't have a problem with the existence of a BETWEEN > operator > >>>>> stems from the fact that BETWEEN doesn't have the same type of > overload > >>>>> that IN does. > >>>>> > >>>>> So yeah though: If we don't think that an IN type operator is good > for > >>>>> Calcite, I need to develop code local to my database that does a > >>>> conversion > >>>>> to a function which differs from my normal RexCall conversion. And > my > >>>>> intuition tells me that this effort has already been done elsewhere. > >>>> Which > >>>>> is why I think it belongs in Calcite, where the community would > benefit. > >>>>> > >>>>> Anyway, having said all this: the improvement in CALCITE-7226 will > >>>>> definitely help me out because creating all those ORs before > generating > >>>> a > >>>>> SEARCH seems quite inefficient. It might even lead to problems in my > >>>> code > >>>>> as the multiple ORs generate many layers of RexNodes which I think > >>>> created > >>>>> some really slow compilations for me during one of my tests. > >>>>> > >>>>> On Tue, Oct 14, 2025 at 11:30 AM Julian Hyde <[email protected] > > > >>>> wrote: > >>>>> > >>>>>> It’s nice to see issues like > >>>>>> https://issues.apache.org/jira/browse/CALCITE-7226 "Convert IN to > >>>> SEARCH > >>>>>> in SqlToRelConverter” being logged. > >>>>>> > >>>>>> When I introduced SEARCH I got a lot of complaints about how I was > >>>> making > >>>>>> plans more complicated and generally breaking things. All I was > really > >>>>>> doing was bringing the concept of a Sarg [1] into our logical > algebra. > >>>> I’m > >>>>>> glad that people now see that SEARCH leads to more robust, efficient > >>>> plans. > >>>>>> > >>>>>> Julian > >>>>>> > >>>>>> [1] https://en.wikipedia.org/wiki/Sargable > >>>> > >>>> > >
