blongworth commented on issue #38638:
URL: https://github.com/apache/arrow/issues/38638#issuecomment-2523952609

   I'm still seeing very non-random sampling with `slice_sample()` in Arrow 
17.0.0. In a 400M row dataset spanning 2023-2024, a 10k row sample consistently 
does not contain timestamps later than Jan 2024. I'm guessing this is the known 
issue described above, but if a reprex would be helpful, I can put one together.
   
   As this issue could be dangerous for someone assuming a random sample, 
should there be a note in the docs or `slice_sample()` removed until it's fixed?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to