Gian Merlino created CALCITE-3178:
-------------------------------------
Summary: RexSimplify.simplifyOrTerms slow with large OR filters
Key: CALCITE-3178
URL: https://issues.apache.org/jira/browse/CALCITE-3178
Project: Calcite
Issue Type: Improvement
Affects Versions: 1.19.0
Reporter: Gian Merlino
In particular, once for each subpredicate within the OR,
RexSimplify.simplifyOrTerms calls {{simplify.predicates.union}} and adds the
freshly-unioned result to {{simplify.predicates}}. The most time-consuming part
of this seems to be {{RexUtil.predicateConstants}}, which re-examines each
previously-added entry. This is O(N^2) in the number of subpredicates within
the OR.
I discovered this when someone tried to run a query with a 14,000-element IN
filter, and planning took about 45 seconds. In Druid, we always convert INs to
ORs, never allowing Calcite's subquery conversion to happen. This is because as
far as native Druid queries are concerned, a huge OR is going to be more
efficient than a join against a constant subquery.
I'm not sure what the best way is to fix this. The only thing that comes to
mind immediately is the "quick fix" of limiting how many OR elements
RexSimplify might attempt to simplify at once (and potentially AND as well? I
haven't looked into that one.)
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)