Whatsonyourmind commented on issue #21310:
URL: https://github.com/apache/datafusion/issues/21310#issuecomment-4185489300

   @xiedeyantu One additional edge case for the rewrite rule: if either branch 
of the UNION contains a `LIMIT` clause, the transformation is invalid. `(SELECT 
a FROM t WHERE x LIMIT 5) UNION (SELECT a FROM t WHERE y LIMIT 5)` cannot be 
rewritten as `SELECT DISTINCT a FROM t WHERE x OR y LIMIT 10` because the LIMIT 
applies before the UNION dedup, not after — the two queries may produce 
overlapping rows that get deduped, so the merged result could have fewer than 
10 rows.
   
   The optimizer rule should check for the absence of LIMIT, ORDER BY, and 
window functions in both branches before applying the transformation. Same 
applies to OFFSET — any row-limiting operation interacts with deduplication in 
order-dependent ways.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to