yashmayya opened a new issue, #17873: URL: https://github.com/apache/pinot/issues/17873
- For joins with only non-equi join conditions, we use a RANDOM + BROADCAST data shuffle strategy - https://github.com/apache/pinot/blob/079206a1bcb94c20ff6336bf9ab96d73f5dde1c4/pinot-query-planner/src/main/java/org/apache/pinot/calcite/rel/rules/PinotJoinExchangeNodeInsertRule.java#L86-L97 - While this works for INNER JOINs and LEFT JOINs, this leads to incorrect results for FULL OUTER JOINs and RIGHT JOINs, because we're unable to correctly determine the unmatched right rows (since each worker will only see a subset of the left side). This will result in duplicate unmatched right rows in the final join result. - For RIGHT JOINs we can probably simply use an inverted BROADCAST + RANDOM strategy instead, but for FULL OUTER JOIN we'll need to use a singleton instance for correct results. Alternatively, if we want to retain parallelism, we could potentially introduce a new mechanism where there's a stage after the join that collects the unmatched row bitmaps from each worker and combines them. However, this would result in a lot of additional complexity to optimize an extremely suboptimal query in the first place (OUTER JOINs with only non-equi join conditions is inherently going to be very expensive). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
