alamb commented on pull request #9672:
URL: https://github.com/apache/arrow/pull/9672#issuecomment-797101275


   > Apply limit for UNION ALL / OUTER JOIN when supported in DataFusion - or 
any other case where we can show the nr of rows generated is at least the same 
as the plan below, or we can somehow show that it will be bigger than the upper 
limit. Those are the two main ones I think - those cases are also supported by 
the similar optimization rule in Spark.
   
   Yes, that makes sense
   
   > Add extra LIMIT nodes throughout plan -> this will probably currently have 
less impact as DataFusion already does stream / limit the nr of batches, but 
may be of use in some cases. For example: expensive expressions in projections 
(as you illustrated) or reducing the input size for an outer join (I think 
there is where it can have a profound impact).
   
   
   👍 
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to