paleolimbot commented on issue #38638: URL: https://github.com/apache/arrow/issues/38638#issuecomment-1875719282
> I was wondering if we can call arrange() to order by the random number and then take the top n rows I think that will work, although I don't know if it will be slower or faster than calling `compute()` (i.e., get me a `Table`) and subset using integers obtained using `sample(seq_len(x$num_rows))`. It is essentially the same thing: in order to do an accurate sample, the final number of rows are needed. One can do a streaming (but approximate) sampling, too, which might be useful for non-statistical purposes (e.g., testing on something more realistic than the first `n` rows of data). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
