[ 
https://issues.apache.org/jira/browse/MAHOUT-676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13021950#comment-13021950
 ] 

Lance Norskog commented on MAHOUT-676:
--------------------------------------


There's a big wide world of sampling algorithms out there. 

Time-based sampling:

[Sampling Time-Based Sliding Windows in Bounded 
Space|http://www.gemulla.de/rg/publications/gemulla08streamsampling.pdf]

Bernoulli sampling is not good at maintaining ratios for repeating items:

[Maintaining Bernoulli Samples over Evolving 
Multisets|http://www.gemulla.de/rg/publications/gemulla07multisetsampling.pdf]

And, if you really can't go to sleep:

[Rainer Gemulla's 281-page PhD thesis on 
sampling|http://www.gemulla.de/rg/publications/gemulla08thesis.pdf]



> Random samplers in a modular library
> ------------------------------------
>
>                 Key: MAHOUT-676
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-676
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Math
>            Reporter: Lance Norskog
>            Priority: Minor
>         Attachments: Sampler.patch
>
>
> This is a modular suite of samplers.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to