neilconway commented on issue #22594: URL: https://github.com/apache/datafusion/issues/22594#issuecomment-4578273241
Working on this, there are some shortcomings we'll probably need to resign ourselves to in the initial implementation: * We don't currently compute the equivalence relation that is implied by the predicates in the WHERE clause. So for example, `SELECT a.x, max(b.y) FROM a, b WHERE a.y = b.y GROUP BY a.x;` _should_ be optimizable, but we don't currently realize that `b.y` is equivalent to `a.y`, and so `b` doesn't contribute any columns to the parent plan. * We don't currently identify which aggregates are duplicate-insensitive. So in the query mentioned previously, we would _also_ fail to optimize it because we don't recognize that `max` is duplicate-insensitive. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
