Neal Richardson created ARROW-17759:
---------------------------------------

             Summary: [R] Implement dplyr::slice_sample()
                 Key: ARROW-17759
                 URL: https://issues.apache.org/jira/browse/ARROW-17759
             Project: Apache Arrow
          Issue Type: Sub-task
          Components: R
            Reporter: Neal Richardson
             Fix For: 10.0.0


{code}
slice_sample(.data, ..., n, prop, weight_by = NULL, replace = FALSE)
{code}

If {{n}} is provided, compute {{nrow(.data)}}, and if that is not NA, convert 
to a {prop}. (Might want to do prop + .01 or something and then do head(n) 
after, i.e. sample more than you need and then take {{n}}, just so you don't by 
randomness get fewer than n.)

With prop, turn this into {{filter(arrow_random() < prop)}}. See ARROW-17572. 

Defer weight_by to a followup. It should be doable but might be expensive (need 
to scan everything to compute sum and ensure that all values are positive).

Defer replace = TRUE. 

Also probably can only do if .data is ungrouped, I think the dplyr methods do 
sampling within groups. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to