alamb opened a new pull request #9926:
URL: https://github.com/apache/arrow/pull/9926


   (note this builds on the code in https://github.com/apache/arrow/pull/9924, 
so marking it a draft until that is merged)
   
   # Rationale
   Once the number of rows needed for a limit query has been produced, any 
further work done to read values from its input is wasted.
   
   The current implementation of LimitStream will keep polling its input for 
the next value, and returning `Poll::Ready(None)` , even once the limit has 
been reached
   
   For queries like `select * from foo limit 10` used for initial data 
exploration this is very wasteful.
   
   # Changes
   
   This PR changes `LimitStream` so that it drops its input once the limit has 
been reached -- this both potentially frees resources (memory, file handles, 
etc) it also avoids unnecessary computation


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to