timothy-e commented on issue #18740:
URL: https://github.com/apache/pinot/issues/18740#issuecomment-4753162999

   Thanks for sharing this! 
   
   > Note for reviewers: enabling stats collection changes the cardinality 
estimates visible to all MSE planner rules (never correctness);
   
   1. Do you know what the potential impact of this is? What rules currently 
use cardinality? If I disable stats collection, do I revert back to the old 
behaviour or are all of my stats permanently updated?
   
   > Short term: build-side normalization — use row counts to place the smaller 
input on the
   hash-build side of a join. Benchmarks above show the logical reorderer 
minimizes
   intermediate cardinality but cannot account for the engine's build-side 
convention (the
   remaining 462 vs 412 ms gap in query 1); this is a small, well-contained 
improvement.
   
   2. What's the difference between this and join re-ordering? 
   
   > Calcite already ships the machinery for cost-based decisions — what Pinot 
is
   missing is statistics at the broker and the integration to consume them.
   
   3. Will the column stats from phase 2 be useful in Calcite? 
   
   > Per-segment stats collected from ZooKeeper metadata the broker already 
watches (row count, size, time boundaries — effectively free), 
   
   4. how accurate are these? how often are they updated? 
   
   5. I'd love a more detailed document describing how all the components fit 
together and more details. e.g. what will Calcite do and what will we have to 
do? 
   
   6. Postgres has the extension pg_hint_plan that makes debugging / developing 
CBO much easier. It can be used to force join order and then observe the cost, 
override bad planner decisions, compare different options. I don't see any kind 
of knobs like that mentioned in the plan. Questions I might have as a user of 
join reordering:
     a. why did the planner choose this ordering and how close are we to 
choosing a different ordering? (In PG I can answer this by trying out different 
join orderings with the `Leading` hint)
     b. how can I override what the planner chose? 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to