In my opinion, we will need both - supporting IN operation (either through an operator or an internal function) and adding support for building a balanced tree. It’s always good to be resilient and capable of handling edge cases. The IN support might require more work. Haisheng’s proposal is a practical solution to current issue.
> On Oct 8, 2019, at 11:06 AM, Haisheng Yuan <[email protected]> wrote: > > Adding IN RexNode only partially solves the problem, as it is still masking > the underlying issue. The fundamental reason for the stack overflow iies in > the left-deep binary tree. For queries that have tens of thousands of OR > condition, but not equals, which is not uncommon in our case, e.g. > (a like '...') or (b like '...') or (c like '..') > there will still be stack overflow. > > - Haisheng > > ------------------------------------------------------------------ > 发件人:Stamatis Zampetakis<[email protected]> > 日 期:2019年10月08日 15:09:01 > 收件人:<[email protected]> > 主 题:Re: Re: [EXT] SqlRexConvertlet that Replicates "IN" Conversion Logic > > It might be better to add a proper IN operator in RexCalls instead of > something internal that does more or less the same thing. > It is true that adds more paths in the code and thus requires some > additional dev and further support but I think it is worth it. > Many people so far expressed an interest to work on various cases involving > an IN operator so it might not be long before > we have full support for the IN operator. > > SqlToRelConverter can still decide to expand or not based on some criterion > or property. > > > On Tue, Oct 8, 2019 at 3:37 AM Julian Hyde <[email protected]> wrote: > >> A SqlCall to $HARD_IN will (by SqlToRelConverter) become a RexCall to >> $HARD_IN, and then (by RelToSqlConverter) become a SqlCall to >> $HARD_IN. $HARD_IN(x, v1, v2) would become (by SqlWriter) the SQL "x >> IN (v1, v2)". >> >> At any point in this lifecycle, you could intercept and and simplify. >> >> On Mon, Oct 7, 2019 at 2:34 PM Haisheng Yuan <[email protected]> >> wrote: >>> >>> Will the filter condition with “$HARD_IN” internal function be able to >> pushed down and be recognized by the source SQL system, like Peter >> mentioned? >>> >>> If not, we have to translate the internal function back to IN during >> Rel2Sql phase. Otherwise, the data read from the source table can be much >> larger. >>> >>> - Haisheng >>> >>> ------------------------------------------------------------------ >>> 发件人:Julian Hyde<[email protected]> >>> 日 期:2019年10月08日 04:53:11 >>> 收件人:dev<[email protected]> >>> 主 题:Re: [EXT] SqlRexConvertlet that Replicates "IN" Conversion Logic >>> >>> In >> https://issues.apache.org/jira/browse/CALCITE-2792?focusedCommentId=16946209&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16946209 >> < >> https://issues.apache.org/jira/browse/CALCITE-2792?focusedCommentId=16946209&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16946209> >> I floated the idea of a “$HARD_IN” internal function that has the same >> semantics as IN but is not expanded to ‘… = OR … = …’. >>> >>> I think it would be a useful tool, if used judiciously. >>> >>> Julian >>> >>> >>>> On Oct 4, 2019, at 7:08 PM, Haisheng Yuan <[email protected]> >> wrote: >>>> >>>> As a workaround, you can modify you SqlRexConverlet, create a RexCall >> with balanced binary tree, e.g. (a=1 or a=2) or (a=3 or a=4), instead of a >> flat RexCall with multiple operands, e.g. a=1 or a=2 or a=3 or a=4. >>>> Because every OR RexCall has exactly 2 operands, it won't transform >> into SqlCall with left deep tree. >>>> >>>> Let me know it works for you or not. >>>> >>>> - Haisheng >>>> >>>> ------------------------------------------------------------------ >>>> 发件人:Haisheng Yuan<[email protected]> >>>> 日 期:2019年10月05日 07:37:04 >>>> 收件人:Peter Wicks (pwicks)<[email protected]>; [email protected]< >> [email protected]> >>>> 主 题:Re: RE: [EXT] Re: SqlRexConvertlet that Replicates "IN" Conversion >> Logic >>>> >>>> If you want to push the filter down to the source SQL sytem, then >> transforming to a join won't help you either. >>>> >>>> The reason of stackoverflow for large ORs is the left deep binary >> tree, we need to change it to balanced binary tree, to reduce the depth of >> the call. >>>> >>>> I will open a pull request later. >>>> >>>> - Haisheng >>>> >>>> ------------------------------------------------------------------ >>>> 发件人:Peter Wicks (pwicks)<[email protected]> >>>> 日 期:2019年10月04日 21:32:25 >>>> 收件人:[email protected]<[email protected]> >>>> 主 题:RE: [EXT] Re: SqlRexConvertlet that Replicates "IN" Conversion >> Logic >>>> >>>> Zoltan, >>>> >>>> Thanks for the suggestion. I actually tried doing a UDF first, and it >> was also successful, sorry for not sharing those details earlier. >>>> The problem with the UDF is that the predicates are not pushed down to >> the source SQL system (by design), and this can result in a 100x increase >> in the amount of data returned from the database. This data will be >> correctly filtered by the UDF, but returning 100x the data makes it a lot >> slower. So I was trying to push it down to the source server instead. >>>> >>>> What do you mean by, "I guess Calcite might probably won't be able to >> do much with these ORs anyway..."? From my experiments I've seen two >> results from passing in this many OR's: >>>> >>>> - If no other predicates are included in the query, then Calcite >> succeeds! It leaves the OR's flat, (a=1 OR a=2 OR a=3 OR a=4) >>>> - If additional predicates are included, then Calcite nests the OR >> statements, leading to a stackoverflow for very large OR's, which is >> CALCITE-2792, ((((a=1) OR a=2) OR a=3) OR a=4) >>>> >>>> Thanks, >>>> Peter >>>> >>>> -----Original Message----- >>>> From: Zoltan Haindrich <[email protected]> >>>> Sent: Friday, October 4, 2019 12:38 AM >>>> To: [email protected]; Haisheng Yuan <[email protected]>; >> Peter Wicks (pwicks) <[email protected]> >>>> Subject: Re: [EXT] Re: SqlRexConvertlet that Replicates "IN" >> Conversion Logic >>>> >>>> >>>> I think you might try another approach: introduce some UDF and use >> your translation logic to call that - as the UDF will be opaque for calcite >> it will be left alone. >>>> I guess Calcite might probably won't be able to do much with these ORs >> anyway... >>>> >>>> >>>> On 10/3/19 11:26 PM, Haisheng Yuan wrote: >>>>> I don't think this can be done in SqlRexConvertlet, which converts >> SqlNode to RexNode. >>>>> You might need to massage the SqlToRelConverter to create the RelNode >> that you want. >>>>> >>>>> BTW, I still think we need RexNode for IN/ANY. >>>> I also feel that there is some need for IN nodes; but there are some >> good sides of not having it as well: like simplification handles them >> better. >>>> >>>> >>>>> >>>>> - Haisheng >>>>> >>>>> ------------------------------------------------------------------ >>>>> 发件人:Peter Wicks (pwicks)<[email protected]> >>>>> 日 期:2019年10月04日 04:03:51 >>>>> 收件人:[email protected]<[email protected]> >>>>> 主 题:RE: [EXT] Re: SqlRexConvertlet that Replicates "IN" Conversion >> Logic >>>>> >>>>> Haisheng, >>>>> >>>>> Yes, that is what I would like to do. Unfortunately, I’m not sure how >> to proceed to actually do that. I was hoping for a pointer to an example >> that is similar? >>>>> >>>>> Thanks! >>>>> Peter >>>>> >>>>> From: Haisheng Yuan <[email protected]> >>>>> Sent: Thursday, October 3, 2019 1:35 PM >>>>> To: Peter Wicks (pwicks) <[email protected]>; [email protected] >>>>> Subject: [EXT] Re: SqlRexConvertlet that Replicates "IN" Conversion >> Logic >>>>> >>>>> Currently Calcite doesn't have IN RexNode, only has IN SqlNode, >> unfortunately. >>>>> >>>>> You can create a Values node with these authorization data, and make >> a semi join with the table and Values you created. >>>>> >>>>> - Haisheng >>>>> >>>>> ------------------------------------------------------------------ >>>>> 发件人:Peter Wicks (pwicks)<[email protected]<mailto:[email protected]>> >>>>> 日 期:2019年10月04日 02:34:02 >>>>> 收件人:[email protected]<[email protected]<mailto: >> [email protected]%[email protected]>> >>>>> 主 题:SqlRexConvertlet that Replicates "IN" Conversion Logic >>>>> >>>>> A little detail about what I'm trying to do: >>>>> >>>>> I have an external API that contains authorization information on a >> per user basis. I want users to be able to include an operation in their >> query that will filter data based on this authorization data. >>>>> >>>>> Using Calcite v1.16 / Java 1.8 / RHEL7, I built a class that >> implements SqlRexConvertlet, and I am able to get this working. The user >> includes in their predicate statement `custom_authorize(column)`, my >> convertlet queries the API, gets the authorization rules, builds an OR >> statement, and the results come back. This works sometimes, but other times >> the OR condition becomes too large, and I run into CALCITE-2792: >> https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FCALCITE-2792&data=02%7C01%7Cpwicks%40micron.com%7C1920339cebed45b0bd5b08d748955d24%7Cf38a5ecd28134862b11bac1d563c806f%7C0%7C1%7C637057678693693555&sdata=0tuXcWcjTHXMQQr%2BmKXBjrSTAW%2BqBG%2Fp3PsK8df2tfk%3D&reserved=0, >> which causes a stackoverflow and my query dies. >>>>> >>>>> So I tried converting to an IN statement, having read that IN >> statements are automatically converted to a sub query join when the default >> limit of 20 is exceeded. The problem is that this appears only to be true >> for IN statements that are included in the initial query. IN statements >> created as the result of a convertlet do not get modified, and are sent as >> an IN statement, which results in a failure to parse the query. I looked at >> how Calcite normally does this translation from IN to exists using a join, >> but it depends on a lot of classes/instances that aren't available in the >> SqlRexContext space. Is it possible to rewrite my IN statement to a >> join/exists query like Calcite normally does? >>>>> >>>>> Also, am I doing things all wrong? Is there a better way to go about >> this? >>>>> >>>>> Code Sample below is for the OR version, the commented code can be >> swapped in to see how I was building the IN statement. >>>>> >>>>> @Override >>>>> public RexNode convertCall(SqlRexContext cx, SqlCall call) { >>>>> HashSet<String> keyList = null; >>>>> try { >>>>> keyList = new >> Manager().getAllowedIDs(getContextInformation().getQueryUser()); >>>>> } catch (SQLException e) { >>>>> e.printStackTrace(); >>>>> } >>>>> >>>>> final RexBuilder rexBuilder = cx.getRexBuilder(); >>>>> final RexNode column = cx.convertExpression(call.operand(0)); >>>>> >>>>> final List<RexNode> nodes = new ArrayList<>(); >>>>> for(String s: keyList) { >>>>> nodes.add(rexBuilder.makeCall(EQUALS, column, >> rexBuilder.makeLiteral(s))); >>>>> //nodes.add(rexBuilder.makeLiteral(s)); >>>>> } >>>>> >>>>> final RexNode in = rexBuilder.makeCall(SqlStdOperatorTable.OR, nodes); >>>>> //final RexNode in = inBuilder(rexBuilder, column, nodes.toArray(new >> RexNode[0])); >>>>> >>>>> return in; >>>>> } >>>>> >>>>> protected RexNode inBuilder(RexBuilder rexBuilder, RexNode node, >> RexNode... nodes) { >>>>> return rexBuilder.makeCall(SqlStdOperatorTable.IN, >>>>> ImmutableList.<RexNode>builder().add(node).add(nodes).build()); >>>>> } >>>>> >>>>> >>>>> Thanks, >>>>> Peter >>>>> >>>> >>>> >>> >>> >> >
