gortiz commented on PR #18741: URL: https://github.com/apache/pinot/pull/18741#issuecomment-4780562091
> It's pretty common to have a lookup join hint, but otherwise not particularly care about join order. Does 'no hinted joins' mean that we won't reorder queries that have any hinted joins? Hints can change the behavior of the join, which means CBO needs to understand them. We can do that on a follow-up. But lookups are a good example. You don't care about join reordering on lookup joins, as there is no reason to swap them (the lookup should always be the right side). In the future we could add support for that so even if we write the join in the other way around, cbo reorders the lookup join if needed. >> hint veto > > Can you elaborate more on what this means? This is explained in pinot-query-planner/src/main/java/org/apache/pinot/query/planner/logical/JoinReorderOptimizer.java, which contains extensive javadoc. Remember this is just _a_ rule. More rules could be added, but I wanted to keep the PR as simple as possible. >> upsert/dedup row counts marked LOW confidence (physical docs over-count logical rows), ... The planner treats LOW/UNKNOWN confidence as "no stats". > > Will we improve this once we have support for number of distinct values column stats consuming segments downgrade confidence. Keeping row counts for upserts/dedups would mean reproducing the upsert/dedup logic (and their expensive data structure) on the brokers. I don't think that makes much sense, but it could be done in a follow-up. > What does downgrade mean? What confidence values do we have? For committed and offline segments we already know how many rows we have. For realtime consuming segments we cannot know that (at least the broker cannot know that). That is why we have to estimate how many rows we have. How we do that is not very important right now. Probably, we can find better heuristics in the future. > Also, I vote for moving the independent fixes out to a seperate PR. Maybe we'll want to revert this PR, and it would be nice to not have to revert the other fixes too. I think it'd be nice to split this up a little bit more, it's harder to find and understand the logic amidst all the plumbing. They can _also_ be added as independent PRs, but they I needed them here to run the quickstarts. Anyway they are not very serious issues -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
