A SqlCall to $HARD_IN will (by SqlToRelConverter) become a RexCall to $HARD_IN, and then (by RelToSqlConverter) become a SqlCall to $HARD_IN. $HARD_IN(x, v1, v2) would become (by SqlWriter) the SQL "x IN (v1, v2)".
At any point in this lifecycle, you could intercept and and simplify. On Mon, Oct 7, 2019 at 2:34 PM Haisheng Yuan <[email protected]> wrote: > > Will the filter condition with “$HARD_IN” internal function be able to pushed > down and be recognized by the source SQL system, like Peter mentioned? > > If not, we have to translate the internal function back to IN during Rel2Sql > phase. Otherwise, the data read from the source table can be much larger. > > - Haisheng > > ------------------------------------------------------------------ > 发件人:Julian Hyde<[email protected]> > 日 期:2019年10月08日 04:53:11 > 收件人:dev<[email protected]> > 主 题:Re: [EXT] SqlRexConvertlet that Replicates "IN" Conversion Logic > > In > https://issues.apache.org/jira/browse/CALCITE-2792?focusedCommentId=16946209&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16946209 > > <https://issues.apache.org/jira/browse/CALCITE-2792?focusedCommentId=16946209&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16946209> > I floated the idea of a “$HARD_IN” internal function that has the same > semantics as IN but is not expanded to ‘… = OR … = …’. > > I think it would be a useful tool, if used judiciously. > > Julian > > > > On Oct 4, 2019, at 7:08 PM, Haisheng Yuan <[email protected]> wrote: > > > > As a workaround, you can modify you SqlRexConverlet, create a RexCall with > > balanced binary tree, e.g. (a=1 or a=2) or (a=3 or a=4), instead of a flat > > RexCall with multiple operands, e.g. a=1 or a=2 or a=3 or a=4. > > Because every OR RexCall has exactly 2 operands, it won't transform into > > SqlCall with left deep tree. > > > > Let me know it works for you or not. > > > > - Haisheng > > > > ------------------------------------------------------------------ > > 发件人:Haisheng Yuan<[email protected]> > > 日 期:2019年10月05日 07:37:04 > > 收件人:Peter Wicks (pwicks)<[email protected]>; > > [email protected]<[email protected]> > > 主 题:Re: RE: [EXT] Re: SqlRexConvertlet that Replicates "IN" Conversion Logic > > > > If you want to push the filter down to the source SQL sytem, then > > transforming to a join won't help you either. > > > > The reason of stackoverflow for large ORs is the left deep binary tree, we > > need to change it to balanced binary tree, to reduce the depth of the call. > > > > I will open a pull request later. > > > > - Haisheng > > > > ------------------------------------------------------------------ > > 发件人:Peter Wicks (pwicks)<[email protected]> > > 日 期:2019年10月04日 21:32:25 > > 收件人:[email protected]<[email protected]> > > 主 题:RE: [EXT] Re: SqlRexConvertlet that Replicates "IN" Conversion Logic > > > > Zoltan, > > > > Thanks for the suggestion. I actually tried doing a UDF first, and it was > > also successful, sorry for not sharing those details earlier. > > The problem with the UDF is that the predicates are not pushed down to the > > source SQL system (by design), and this can result in a 100x increase in > > the amount of data returned from the database. This data will be correctly > > filtered by the UDF, but returning 100x the data makes it a lot slower. So > > I was trying to push it down to the source server instead. > > > > What do you mean by, "I guess Calcite might probably won't be able to do > > much with these ORs anyway..."? From my experiments I've seen two results > > from passing in this many OR's: > > > > - If no other predicates are included in the query, then Calcite succeeds! > > It leaves the OR's flat, (a=1 OR a=2 OR a=3 OR a=4) > > - If additional predicates are included, then Calcite nests the OR > > statements, leading to a stackoverflow for very large OR's, which is > > CALCITE-2792, ((((a=1) OR a=2) OR a=3) OR a=4) > > > > Thanks, > > Peter > > > > -----Original Message----- > > From: Zoltan Haindrich <[email protected]> > > Sent: Friday, October 4, 2019 12:38 AM > > To: [email protected]; Haisheng Yuan <[email protected]>; Peter > > Wicks (pwicks) <[email protected]> > > Subject: Re: [EXT] Re: SqlRexConvertlet that Replicates "IN" Conversion > > Logic > > > > > > I think you might try another approach: introduce some UDF and use your > > translation logic to call that - as the UDF will be opaque for calcite it > > will be left alone. > > I guess Calcite might probably won't be able to do much with these ORs > > anyway... > > > > > > On 10/3/19 11:26 PM, Haisheng Yuan wrote: > >> I don't think this can be done in SqlRexConvertlet, which converts SqlNode > >> to RexNode. > >> You might need to massage the SqlToRelConverter to create the RelNode that > >> you want. > >> > >> BTW, I still think we need RexNode for IN/ANY. > > I also feel that there is some need for IN nodes; but there are some good > > sides of not having it as well: like simplification handles them better. > > > > > >> > >> - Haisheng > >> > >> ------------------------------------------------------------------ > >> 发件人:Peter Wicks (pwicks)<[email protected]> > >> 日 期:2019年10月04日 04:03:51 > >> 收件人:[email protected]<[email protected]> > >> 主 题:RE: [EXT] Re: SqlRexConvertlet that Replicates "IN" Conversion Logic > >> > >> Haisheng, > >> > >> Yes, that is what I would like to do. Unfortunately, I’m not sure how to > >> proceed to actually do that. I was hoping for a pointer to an example that > >> is similar? > >> > >> Thanks! > >> Peter > >> > >> From: Haisheng Yuan <[email protected]> > >> Sent: Thursday, October 3, 2019 1:35 PM > >> To: Peter Wicks (pwicks) <[email protected]>; [email protected] > >> Subject: [EXT] Re: SqlRexConvertlet that Replicates "IN" Conversion Logic > >> > >> Currently Calcite doesn't have IN RexNode, only has IN SqlNode, > >> unfortunately. > >> > >> You can create a Values node with these authorization data, and make a > >> semi join with the table and Values you created. > >> > >> - Haisheng > >> > >> ------------------------------------------------------------------ > >> 发件人:Peter Wicks (pwicks)<[email protected]<mailto:[email protected]>> > >> 日 期:2019年10月04日 02:34:02 > >> 收件人:[email protected]<[email protected]<mailto:[email protected]%[email protected]>> > >> 主 题:SqlRexConvertlet that Replicates "IN" Conversion Logic > >> > >> A little detail about what I'm trying to do: > >> > >> I have an external API that contains authorization information on a per > >> user basis. I want users to be able to include an operation in their query > >> that will filter data based on this authorization data. > >> > >> Using Calcite v1.16 / Java 1.8 / RHEL7, I built a class that implements > >> SqlRexConvertlet, and I am able to get this working. The user includes in > >> their predicate statement `custom_authorize(column)`, my convertlet > >> queries the API, gets the authorization rules, builds an OR statement, and > >> the results come back. This works sometimes, but other times the OR > >> condition becomes too large, and I run into CALCITE-2792: > >> https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FCALCITE-2792&data=02%7C01%7Cpwicks%40micron.com%7C1920339cebed45b0bd5b08d748955d24%7Cf38a5ecd28134862b11bac1d563c806f%7C0%7C1%7C637057678693693555&sdata=0tuXcWcjTHXMQQr%2BmKXBjrSTAW%2BqBG%2Fp3PsK8df2tfk%3D&reserved=0, > >> which causes a stackoverflow and my query dies. > >> > >> So I tried converting to an IN statement, having read that IN statements > >> are automatically converted to a sub query join when the default limit of > >> 20 is exceeded. The problem is that this appears only to be true for IN > >> statements that are included in the initial query. IN statements created > >> as the result of a convertlet do not get modified, and are sent as an IN > >> statement, which results in a failure to parse the query. I looked at how > >> Calcite normally does this translation from IN to exists using a join, but > >> it depends on a lot of classes/instances that aren't available in the > >> SqlRexContext space. Is it possible to rewrite my IN statement to a > >> join/exists query like Calcite normally does? > >> > >> Also, am I doing things all wrong? Is there a better way to go about this? > >> > >> Code Sample below is for the OR version, the commented code can be swapped > >> in to see how I was building the IN statement. > >> > >> @Override > >> public RexNode convertCall(SqlRexContext cx, SqlCall call) { > >> HashSet<String> keyList = null; > >> try { > >> keyList = new > >> Manager().getAllowedIDs(getContextInformation().getQueryUser()); > >> } catch (SQLException e) { > >> e.printStackTrace(); > >> } > >> > >> final RexBuilder rexBuilder = cx.getRexBuilder(); > >> final RexNode column = cx.convertExpression(call.operand(0)); > >> > >> final List<RexNode> nodes = new ArrayList<>(); > >> for(String s: keyList) { > >> nodes.add(rexBuilder.makeCall(EQUALS, column, rexBuilder.makeLiteral(s))); > >> //nodes.add(rexBuilder.makeLiteral(s)); > >> } > >> > >> final RexNode in = rexBuilder.makeCall(SqlStdOperatorTable.OR, nodes); > >> //final RexNode in = inBuilder(rexBuilder, column, nodes.toArray(new > >> RexNode[0])); > >> > >> return in; > >> } > >> > >> protected RexNode inBuilder(RexBuilder rexBuilder, RexNode node, > >> RexNode... nodes) { > >> return rexBuilder.makeCall(SqlStdOperatorTable.IN, > >> ImmutableList.<RexNode>builder().add(node).add(nodes).build()); > >> } > >> > >> > >> Thanks, > >> Peter > >> > > > > > >
