Re: [I] [Umbrella] Cost-based optimization for the multi-stage query engine [pinot]

via GitHub Tue, 23 Jun 2026 08:25:54 -0700


gortiz commented on issue #18740:
URL: https://github.com/apache/pinot/issues/18740#issuecomment-4780738502


   > Do you know what the potential impact of this is? What rules currently use 
cardinality? If I disable stats collection, do I revert back to the old 
behaviour or are all of my stats permanently updated?
   
   Currently, there is no rule that uses cardinality, which is what 
https://github.com/apache/pinot/pull/18741 is adding. If it were merged and the 
stat collection was disabled, the CBO wouldn't be able to infer anything, and 
it should do nothing.
   
   > What's the difference between this and join re-ordering?
   
   Reordering means how to order 3 or more joins, while _this_ means just 
swapping the order of a single join. They are different Calcite rules.
   
   > Will the column stats from phase 2 be useful in Calcite?
   
   Yes. They are an optimization on the rules we have in 
https://github.com/apache/pinot/pull/18741. We can provide better estimations 
knowing that data is split in segments and segments have their own max and min.
   
   For example, imagine we have the following segments and a query with `where 
col = 6`
   
       1. seg1: min = 0, max = 10, ndv = 10, count = X
       2. seg2: min = 5, max = 10, ndv = 5, count = 10 * X
       3. seg3: min = 7, max = 15, ndv = 7, count = 100 * X
   
   ```
     Total rows = X + 10X + 100X = 111X.
   
     
┌──────┬────────┬─────┬───────┬────────────────┬────────────────────────────────┐
     │ seg  │ range  │ ndv │ count │ can contain 6? │ est. matching rows 
(count/ndv) │
     
├──────┼────────┼─────┼───────┼────────────────┼────────────────────────────────┤
     │ seg1 │ [0,10] │ 10  │ X     │ yes            │ X/10 = 0.1X               
     │
     
├──────┼────────┼─────┼───────┼────────────────┼────────────────────────────────┤
     │ seg2 │ [5,10] │ 5   │ 10X   │ yes            │ 10X/5 = 2X                
     │
     
├──────┼────────┼─────┼───────┼────────────────┼────────────────────────────────┤
     │ seg3 │ [7,15] │ 7   │ 100X  │ no (6 < 7)     │ pruned → 0                
     │
     
└──────┴────────┴─────┴───────┴────────────────┴────────────────────────────────┘
   
   ```
   
   - Per-segment: ≈ 2.1X matching → selectivity ≈ 1.9%
   - What we have in Phase 1: merged ndv = MAX(ndv) = 10 → 1/ndv = 0.1 → 0.1 × 
111X = 11.1X → 10%
   
   
   >> Per-segment stats collected from ZooKeeper metadata the broker already 
watches (row count, size, time boundaries — effectively free),
   >
   > how accurate are these? how often are they updated?
   
   They are perfectly accurate, at least for committed realtime segments and 
offline segments. Given they are immutable, once ZK knows about the new 
segment, the broker is notified and changes its stats. The counts, max, min, 
etc are already stored per segment.  
   
   > I'd love a more detailed document describing how all the components fit 
together and more details. e.g. what will Calcite do and what will we have to 
do?
   
   With current AI usage, it is easier for me to create the code, run it and 
verify how it works (or how it could be improved) than to document everything 
from scratch. What I can do is create the PRs and then formalize a broader 
overview document for discussion.
   
   > Postgres has the extension pg_hint_plan that makes debugging / developing 
CBO much easier. It can be used to force join order and then observe the cost, 
override bad planner decisions, compare different options. I don't see any kind 
of knobs like that mentioned in the plan. Questions I might have as a user of 
join reordering:
   > a. why did the planner choose this ordering and how close are we to 
choosing a different ordering? (In PG I can answer this by trying out different 
join orderings with the Leading hint)
   > b. how can I override what the planner chose?
   
   That is a good feature and something we can add. I don't think we should aim 
for the perfect CBO with all the possible features we can imagine in the first 
version, but that is something that could probably be added. Right now in 
https://github.com/apache/pinot/pull/18741 any join hint will prevent the 
optiomization, so the order would be the same as the one we were using right 
now (basically, the one decided by the parser).
   
   Remember we can apply _a lot_ of optimizations with CBO, not just limited to 
join ordering. https://github.com/apache/pinot/pull/18741 adds join ordering 
because it is the most simple one to understand and therefore can be used as an 
example for more optimizations


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] [Umbrella] Cost-based optimization for the multi-stage query engine [pinot]

Reply via email to