[ 
https://issues.apache.org/jira/browse/CALCITE-3178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16885556#comment-16885556
 ] 

Julian Hyde commented on CALCITE-3178:
--------------------------------------

The ideal fix, in my view, would be to make the simplify algorithm O(n log n). 
But short of that, a threshold, above which we do not attempt to simplify OR.

The test would be to create a large IN clause programmatically (say 20,000 
elements), set the threshold to say 100, and assume that if the threshold stops 
working we'd notice the test running slowly. Because the test exists someone 
would be able to poke at it and try to do the O(n log n) fix in future.

> RexSimplify.simplifyOrTerms slow with large OR filters
> ------------------------------------------------------
>
>                 Key: CALCITE-3178
>                 URL: https://issues.apache.org/jira/browse/CALCITE-3178
>             Project: Calcite
>          Issue Type: Improvement
>    Affects Versions: 1.19.0
>            Reporter: Gian Merlino
>            Priority: Major
>
> In particular, once for each subpredicate within the OR, 
> RexSimplify.simplifyOrTerms calls {{simplify.predicates.union}} and adds the 
> freshly-unioned result to {{simplify.predicates}}. The most time-consuming 
> part of this seems to be {{RexUtil.predicateConstants}}, which re-examines 
> each previously-added entry. This is O(N^2) in the number of subpredicates 
> within the OR.
> I discovered this when someone tried to run a query with a 14,000-element IN 
> filter, and planning took about 45 seconds. In Druid, we always convert INs 
> to ORs, never allowing Calcite's subquery conversion to happen. This is 
> because as far as native Druid queries are concerned, a huge OR is going to 
> be more efficient than a join against a constant subquery.
> I'm not sure what the best way is to fix this. The only thing that comes to 
> mind immediately is the "quick fix" of limiting how many OR elements 
> RexSimplify might attempt to simplify at once (and potentially AND as well? I 
> haven't looked into that one.)



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to