Jackie-Jiang commented on PR #14893: URL: https://github.com/apache/pinot/pull/14893#issuecomment-2611490953
> I would need more time to review the code and, ideally, some explanation of the decisions you made here. The changes look to me more complex than I would have expected. We are deviating from the standard Calcite semantics here (i.e., with singleton + parallelism), and I'm not sure why we need to do that. > > What I would expect in this situation is that the join node uses the broadcast distribution for its right side (meaning that each incarnation of the join will see all the data). The main difference with the regular broadcast is that instead of picking one server per segment and broadcasting from them, we pick all servers _that will execute the left side_ and read from them, sending the information to its own node. Broadcast is supported in #14797, but there could still be data shuffling. With this PR, we can completely eliminate data shuffling, and right table is always served from the same server. Regarding `singleton + parallelism`, this is needed to increase parallelism for intermediate stage. If we do singleton (1-to-1 exchange), there will be same number of intermediate operators as leaf operators, which is not good enough in a lot of cases. We usually run only one leaf operator per server, but we want to run more intermediate operators to fully utilize CPU. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
