Will the filter condition with “$HARD_IN” internal function be able to pushed 
down and be recognized by the source SQL system, like Peter mentioned?

If not, we have to translate the internal function back to IN during Rel2Sql 
phase. Otherwise, the data read from the source table can be much larger.

- Haisheng

------------------------------------------------------------------
发件人:Julian Hyde<[email protected]>
日 期:2019年10月08日 04:53:11
收件人:dev<[email protected]>
主 题:Re: [EXT] SqlRexConvertlet that Replicates "IN" Conversion Logic

In 
https://issues.apache.org/jira/browse/CALCITE-2792?focusedCommentId=16946209&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16946209
 
<https://issues.apache.org/jira/browse/CALCITE-2792?focusedCommentId=16946209&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16946209>
 I floated the idea of a “$HARD_IN” internal function that has the same 
semantics as IN but is not expanded to ‘… = OR … = …’.

I think it would be a useful tool, if used judiciously. 

Julian


> On Oct 4, 2019, at 7:08 PM, Haisheng Yuan <[email protected]> wrote:
> 
> As a workaround, you can modify you SqlRexConverlet, create a RexCall with 
> balanced binary tree, e.g. (a=1 or a=2) or (a=3 or a=4), instead of a flat 
> RexCall with multiple operands, e.g. a=1 or a=2 or a=3 or a=4.
> Because every OR RexCall has exactly 2 operands, it won't transform into 
> SqlCall with left deep tree.
> 
> Let me know it works for you or not.
> 
> - Haisheng
> 
> ------------------------------------------------------------------
> 发件人:Haisheng Yuan<[email protected]>
> 日 期:2019年10月05日 07:37:04
> 收件人:Peter Wicks (pwicks)<[email protected]>; 
> [email protected]<[email protected]>
> 主 题:Re: RE: [EXT] Re: SqlRexConvertlet that Replicates "IN" Conversion Logic
> 
> If you want to push the filter down to the source SQL sytem, then 
> transforming to a join won't help you either.
> 
> The reason of stackoverflow for large ORs is the left deep binary tree, we 
> need to change it to balanced binary tree, to reduce the depth of the call.
> 
> I will open a pull request later.
> 
> - Haisheng
> 
> ------------------------------------------------------------------
> 发件人:Peter Wicks (pwicks)<[email protected]>
> 日 期:2019年10月04日 21:32:25
> 收件人:[email protected]<[email protected]>
> 主 题:RE: [EXT] Re: SqlRexConvertlet that Replicates "IN" Conversion Logic
> 
> Zoltan,
> 
> Thanks for the suggestion. I actually tried doing a UDF first, and it was 
> also successful, sorry for not sharing those details earlier.
> The problem with the UDF is that the predicates are not pushed down to the 
> source SQL system (by design), and this can result in a 100x increase in the 
> amount of data returned from the database. This data will be correctly 
> filtered by the UDF, but returning 100x the data makes it a lot slower. So I 
> was trying to push it down to the source server instead.
> 
> What do you mean by, "I guess Calcite might probably won't be able to do much 
> with these ORs anyway..."? From my experiments I've seen two results from 
> passing in this many OR's:
> 
> - If no other predicates are included in the query, then Calcite succeeds! It 
> leaves the OR's flat, (a=1 OR a=2 OR a=3 OR a=4)
> - If additional predicates are included, then Calcite nests the OR 
> statements, leading to a stackoverflow for very large OR's, which is 
> CALCITE-2792, ((((a=1) OR a=2) OR a=3) OR a=4) 
> 
> Thanks,
> Peter
> 
> -----Original Message-----
> From: Zoltan Haindrich <[email protected]> 
> Sent: Friday, October 4, 2019 12:38 AM
> To: [email protected]; Haisheng Yuan <[email protected]>; Peter 
> Wicks (pwicks) <[email protected]>
> Subject: Re: [EXT] Re: SqlRexConvertlet that Replicates "IN" Conversion Logic
> 
> 
> I think you might try another approach: introduce some UDF and use your 
> translation logic to call that - as the UDF will be opaque for calcite it 
> will be left alone.
> I guess Calcite might probably won't be able to do much with these ORs 
> anyway...
> 
> 
> On 10/3/19 11:26 PM, Haisheng Yuan wrote:
>> I don't think this can be done in SqlRexConvertlet, which converts SqlNode 
>> to RexNode.
>> You might need to massage the SqlToRelConverter to create the RelNode that 
>> you want.
>> 
>> BTW, I still think we need RexNode for IN/ANY.
> I also feel that there is some need for IN nodes; but there are some good 
> sides of not having it as well: like simplification handles them better.
> 
> 
>> 
>> - Haisheng
>> 
>> ------------------------------------------------------------------
>> 发件人:Peter Wicks (pwicks)<[email protected]>
>> 日 期:2019年10月04日 04:03:51
>> 收件人:[email protected]<[email protected]>
>> 主 题:RE: [EXT] Re: SqlRexConvertlet that Replicates "IN" Conversion Logic
>> 
>> Haisheng,
>> 
>> Yes, that is what I would like to do. Unfortunately, I’m not sure how to 
>> proceed to actually do that. I was hoping for a pointer to an example that 
>> is similar?
>> 
>> Thanks!
>> Peter
>> 
>> From: Haisheng Yuan <[email protected]>
>> Sent: Thursday, October 3, 2019 1:35 PM
>> To: Peter Wicks (pwicks) <[email protected]>; [email protected]
>> Subject: [EXT] Re: SqlRexConvertlet that Replicates "IN" Conversion Logic
>> 
>> Currently Calcite doesn't have IN RexNode, only has IN SqlNode, 
>> unfortunately.
>> 
>> You can create a Values node with these authorization data, and make a semi 
>> join with the table and Values you created.
>> 
>> - Haisheng
>> 
>> ------------------------------------------------------------------
>> 发件人:Peter Wicks (pwicks)<[email protected]<mailto:[email protected]>>
>> 日 期:2019年10月04日 02:34:02
>> 收件人:[email protected]<[email protected]<mailto:[email protected]%[email protected]>>
>> 主 题:SqlRexConvertlet that Replicates "IN" Conversion Logic
>> 
>> A little detail about what I'm trying to do:
>> 
>> I have an external API that contains authorization information on a per user 
>> basis. I want users to be able to include an operation in their query that 
>> will filter data based on this authorization data.
>> 
>> Using Calcite v1.16 / Java 1.8 / RHEL7, I built a class that implements 
>> SqlRexConvertlet, and I am able to get this working. The user includes in 
>> their predicate statement `custom_authorize(column)`, my convertlet queries 
>> the API, gets the authorization rules, builds an OR statement, and the 
>> results come back. This works sometimes, but other times the OR condition 
>> becomes too large, and I run into CALCITE-2792: 
>> https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FCALCITE-2792&amp;data=02%7C01%7Cpwicks%40micron.com%7C1920339cebed45b0bd5b08d748955d24%7Cf38a5ecd28134862b11bac1d563c806f%7C0%7C1%7C637057678693693555&amp;sdata=0tuXcWcjTHXMQQr%2BmKXBjrSTAW%2BqBG%2Fp3PsK8df2tfk%3D&amp;reserved=0,
>>  which causes a stackoverflow and my query dies.
>> 
>> So I tried converting to an IN statement, having read that IN statements are 
>> automatically converted to a sub query join when the default limit of 20 is 
>> exceeded. The problem is that this appears only to be true for IN statements 
>> that are included in the initial query. IN statements created as the result 
>> of a convertlet do not get modified, and are sent as an IN statement, which 
>> results in a failure to parse the query. I looked at how Calcite normally 
>> does this translation from IN to exists using a join, but it depends on a 
>> lot of classes/instances that aren't available in the SqlRexContext space. 
>> Is it possible to rewrite my IN statement to a join/exists query like 
>> Calcite normally does?
>> 
>> Also, am I doing things all wrong? Is there a better way to go about this?
>> 
>> Code Sample below is for the OR version, the commented code can be swapped 
>> in to see how I was building the IN statement.
>> 
>> @Override
>> public RexNode convertCall(SqlRexContext cx, SqlCall call) {
>> HashSet<String> keyList = null;
>> try {
>> keyList = new 
>> Manager().getAllowedIDs(getContextInformation().getQueryUser());
>> } catch (SQLException e) {
>> e.printStackTrace();
>> }
>> 
>> final RexBuilder rexBuilder = cx.getRexBuilder();
>> final RexNode column = cx.convertExpression(call.operand(0));
>> 
>> final List<RexNode> nodes = new ArrayList<>();
>> for(String s: keyList) {
>> nodes.add(rexBuilder.makeCall(EQUALS, column, rexBuilder.makeLiteral(s)));
>> //nodes.add(rexBuilder.makeLiteral(s));
>> }
>> 
>> final RexNode in = rexBuilder.makeCall(SqlStdOperatorTable.OR, nodes);
>> //final RexNode in = inBuilder(rexBuilder, column, nodes.toArray(new 
>> RexNode[0]));
>> 
>> return in;
>> }
>> 
>> protected RexNode inBuilder(RexBuilder rexBuilder, RexNode node, RexNode... 
>> nodes) {
>> return rexBuilder.makeCall(SqlStdOperatorTable.IN,
>> ImmutableList.<RexNode>builder().add(node).add(nodes).build());
>> }
>> 
>> 
>> Thanks,
>> Peter
>> 
> 
> 


Reply via email to