gianm opened a new issue #9229: Proper handling for RIGHT OUTER and FULL OUTER joins URL: https://github.com/apache/druid/issues/9229 #8728 adds low-level support for building join cursors, but there is a problem with the design when it comes to RIGHT OUTER and FULL OUTER joins. The set of matching left-hand rows is currently computed per segment, but it should be computed globally. Doing it per-segment leads to two problems: 1. If left-hand-side segment A contains a row that matches right-hand side row X, but segment B doesn't, then segment B's join cursor will emit a row for X with nulls on the left-hand columns. This shouldn't happen, since row X _did_ match a left-hand row (from segment A). It means there will be "extra" rows. 2. If there are no segments on the left-hand side at all (an empty table) then there will be no rows generated at all. But we should really generate one row for every row on the right-hand side. This behavior is still somewhat useful — the extra rows from (1) will not affect the results of many believable queries [*], and the empty-left-table scenario from (2) is an edge case — but it is not correct, so we need to fix it. [*] Imagine a query like `SELECT countries.countryName, SUM(revenue) FROM sales FULL OUTER JOIN countries ON sales.countryCode = countries.countryCode GROUP BY countries.countryName`.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
