[ 
https://issues.apache.org/jira/browse/PIG-3221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13640029#comment-13640029
 ] 

Vicki Fu commented on PIG-3221:
-------------------------------

Thank you Gianmarco.
The output of the sampling is k set of resample data. If the small data run in 
R using a matrix as the input could be:
---R code as the following will be easy--
A <- matrix(seq(1,100),10,10)
k <- 10 # 10 bootstrap replicate set
replicate(k, apply(A, 2, sample, replace = TRUE))

Y, you are right, the statistics result can be collected by UDF.
My plan is implement bootstrap, Reservoir and Stratified Sampling in order in 
this project.
Please correct me if my understand is not right.
Thanks
Vicky

                
> Bootstrap sampling
> ------------------
>
>                 Key: PIG-3221
>                 URL: https://issues.apache.org/jira/browse/PIG-3221
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Gianmarco De Francisci Morales
>              Labels: gsoc2013
>
> Implement a bootstrap sampling option ( 
> http://en.wikipedia.org/wiki/Bootstrap_(statistics) ) in Pig's SAMPLE 
> operator.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to