gianm opened a new issue #9229: Proper handling for RIGHT OUTER and FULL OUTER 
joins
URL: https://github.com/apache/druid/issues/9229
 
 
   #8728 adds low-level support for building join cursors, but there is a 
problem with the design when it comes to RIGHT OUTER and FULL OUTER joins. The 
set of matching left-hand rows is currently computed per segment, but it should 
be computed globally. Doing it per-segment leads to two problems:
   
   1. If left-hand-side segment A contains a row that matches right-hand side 
row X, but segment B doesn't, then segment B's join cursor will emit a row for 
X with nulls on the left-hand columns. This shouldn't happen, since row X _did_ 
match a left-hand row (from segment A). It means there will be "extra" rows.
   2. If there are no segments on the left-hand side at all (an empty table) 
then there will be no rows generated at all. But we should really generate one 
row for every row on the right-hand side.
   
   This behavior is still somewhat useful — the extra rows from (1) will not 
affect the results of many believable queries [*], and the empty-left-table 
scenario from (2) is an edge case — but it is not correct, so we need to fix it.
   
   [*] Imagine a query like `SELECT countries.countryName, SUM(revenue) FROM 
sales FULL OUTER JOIN countries ON sales.countryCode = countries.countryCode 
GROUP BY countries.countryName`.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to