gortiz opened a new issue, #13617:
URL: https://github.com/apache/pinot/issues/13617
Since we updated Calcite dependency from 1.31 to 1.37, multi stage engine
has issues executing queries that use IN expressions with a lot (lets say >50)
elements. Specifically, we spend a lot (in the order of seconds) time
optimizing the query.
It can be easily tested by running `MultistageEngineQuickStart` and
executing a query like:
```sql
explain plan for
SELECT DestCityName
FROM (
SELECT DestCityName
FROM airlineStats
WHERE DestCityName IN (
'a0',
'a1','a2','a3','a4','a5','a6','a7','a8','a9',
'a10', 'a11', 'a12', 'a13', 'a14', 'a15', 'a16',
'a17', 'a18', 'a19',
'a20', 'a21', 'a22', 'a23', 'a24', 'a25', 'a26',
'a27', 'a28', 'a29',
'a30', 'a31', 'a32', 'a33', 'a34', 'a35', 'a36',
'a37', 'a38', 'a39',
'a40', 'a41', 'a42', 'a43', 'a44', 'a45', 'a46',
'a47', 'a48', 'a49'
)
GROUP BY DestCityName
LIMIT 2147483647
) as a
```
After studying the issue for a while, it looks like the issue comes from a
newly introduced Calcite optimization when dealing with ORs. In order to
calculate that optimization, Calcite applies an algorithm whose cost is at
least quadratic in terms of sub-predicates in the OR.
These OR expressions are generated either by Calcite or by Pinot in
different parts of the code. Right now I found that:
- IN can be transformed into OR while SqlNodes are converted into RelNodes.
Calcite knows it may be a problem and by default does not convert INs that have
more than 20 elements. But we explicitly set that limit to `Integer.MAX_VALUE`
in
[PinotRuleUtils.PINOT_SQL_TO_REL_CONFIG](https://github.com/apache/pinot/blob/3db93ccc210991ad91b6a5306b3b610d8f5e0507/pinot-query-planner/src/main/java/org/apache/pinot/calcite/rel/rules/PinotRuleUtils.java#L50).
- `PinotFilterExpandSearchRule`, where we transform any remanent SEARCH node
into ORS.
We have explored a couple of alternatives to fix the problem in
https://github.com/apache/pinot/pull/13605 and
https://github.com/apache/pinot/pull/13614
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]