[
https://issues.apache.org/jira/browse/CALCITE-6467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17930273#comment-17930273
]
Gonzalo Ortiz edited comment on CALCITE-6467 at 2/25/25 10:57 AM:
------------------------------------------------------------------
Oh! I didn't see that message from [~nobigo]. I'll take a look.
Edit: It looks that CALCITE-5036 reverted the inefficiency in RelMdPredicate,
but we still have the problem when SqlNodes are converted into RelNodes.
was (Author: JIRAUSER301988):
Oh! I didn't see that message from [~nobigo]. I'll take a look
> Performance of RelMdUtil.checkInputForCollationAndLimit when using `where col
> in (large literal set)`
> -----------------------------------------------------------------------------------------------------
>
> Key: CALCITE-6467
> URL: https://issues.apache.org/jira/browse/CALCITE-6467
> Project: Calcite
> Issue Type: Improvement
> Affects Versions: 1.32.0, 1.33.0, 1.34.0, 1.35.0, 1.36.0, 1.37.0, 1.38.0
> Reporter: Gonzalo Ortiz
> Priority: Major
> Attachments: MultistageEngineQuickStart_2024_07_12_111558.jfr,
> image-2024-07-12-15-58-33-504.png
>
>
> Recently we have updated Pinot to use Calcite 1.37. Previously we were using
> 1.31.
> After the upgrade, we have found issues when executing some queries that
> include large IN clauses. Queries like:
> {code:java}
> explain plan for
> SELECT DestCityName
> FROM (
> SELECT DestCityName
> FROM airlineStats
> WHERE DestCityName IN (
> 'a1', 'a2', 'a3', ... 'a300'
> )
> GROUP BY DestCityName
> ) as a
> {code}
> After some debug, we have found that the issue is in one of our custom rules (
> PinotSortExchangeCopyRule) when we call
> `RelMdUtil.checkInputForCollationAndLimit`
> ([link|https://github.com/apache/pinot/blob/dacc6d06907c44e83721454f1090e5f00c824f15/pinot-query-planner/src/main/java/org/apache/pinot/calcite/rel/rules/PinotSortExchangeCopyRule.java#L64]).
>
> Comparing different Pinot versions, I've found out that the change that is
> causing problems in our scenario is #CALCITE-5036 (see
>
> [https://github.com/apache/calcite/pull/2743)|https://github.com/apache/calcite/pull/2743/files).].
> Specifically, it looks like the problem appears when `RelMdPredicates` tries
> to simplify the expression, which at the time is basically an
> `OR(DestCityName = a1, DestCityName = a2, ...)`.
> `RexSimplify.simplifyOrTerms` negates previous terms in order to apply
> possible optimizations, but in cases like this where we have hundreds of
> literals, that is very expensive. Going more in detail, it looks like most of
> the time is invested in creating new ranges:
> !image-2024-07-12-15-58-33-504.png!
>
> You may find more insights in the attached JFR file
> [^MultistageEngineQuickStart_2024_07_12_111558.jfr][^MultistageEngineQuickStart_2024_07_12_111558.jfr]
>
> I've tried to reproduce the problem with `sqline` but I wasn't able to do so.
> As far as I can see in the code, RelMdUtil.checkInputForCollationAndLimit is
> only called in SortJoinCopyRule, SortJoinTransposeRule and
> SortUnionTransposeRule. I've tried to create a test or JMH benchmark in
> Calcite to try to reproduce the issue, but I don't know codebase well enough.
>
> I don't consider myself an expert on Apache Calcite and I know we are not
> using Calcite in the most standard way (we are slowly migrating from our own
> engine to Calcite), but I'm pretty confident this issue may also affect other
> Calcite usages. At least in the trace I cannot see anything Pinot specific.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)