alamb commented on pull request #9672: URL: https://github.com/apache/arrow/pull/9672#issuecomment-797101275
> Apply limit for UNION ALL / OUTER JOIN when supported in DataFusion - or any other case where we can show the nr of rows generated is at least the same as the plan below, or we can somehow show that it will be bigger than the upper limit. Those are the two main ones I think - those cases are also supported by the similar optimization rule in Spark. Yes, that makes sense > Add extra LIMIT nodes throughout plan -> this will probably currently have less impact as DataFusion already does stream / limit the nr of batches, but may be of use in some cases. For example: expensive expressions in projections (as you illustrated) or reducing the input size for an outer join (I think there is where it can have a profound impact). 👍 ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org