yashmayya opened a new issue, #17873:
URL: https://github.com/apache/pinot/issues/17873

   - For joins with only non-equi join conditions, we use a RANDOM + BROADCAST 
data shuffle strategy - 
https://github.com/apache/pinot/blob/079206a1bcb94c20ff6336bf9ab96d73f5dde1c4/pinot-query-planner/src/main/java/org/apache/pinot/calcite/rel/rules/PinotJoinExchangeNodeInsertRule.java#L86-L97
   - While this works for INNER JOINs and LEFT JOINs, this leads to incorrect 
results for FULL OUTER JOINs and RIGHT JOINs, because we're unable to correctly 
determine the unmatched right rows (since each worker will only see a subset of 
the left side). This will result in duplicate unmatched right rows in the final 
join result.
   - For RIGHT JOINs we can probably simply use an inverted BROADCAST + RANDOM 
strategy instead, but for FULL OUTER JOIN we'll need to use a singleton 
instance for correct results. Alternatively, if we want to retain parallelism, 
we could potentially introduce a new mechanism where there's a stage after the 
join that collects the unmatched row bitmaps from each worker and combines 
them. However, this would result in a lot of additional complexity to optimize 
an extremely suboptimal query in the first place (OUTER JOINs with only 
non-equi join conditions is inherently going to be very expensive).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to