ion-elgreco opened a new issue, #12911:
URL: https://github.com/apache/datafusion/issues/12911

                 Is there a better way we could do this? Maybe add something 
upstream if necessary?
   
   As I'm thinking of it, I don't know that this operation is necessarily well 
defined. Just like with `limit` when you call it multiple times on a large 
dataframe you get different results, I would expect different results from 
multiple calls here.
   
   If we do put this in, I would suggest adding more text to the description to 
explain why this is an expensive operation - that it performs a collect to 
determine the size of the dataframe.
   
   _Originally posted by @timsaucer in 
https://github.com/apache/datafusion-python/pull/915#discussion_r1798327215_
               


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to