[
https://issues.apache.org/jira/browse/MAHOUT-1722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Suneel Marthi closed MAHOUT-1722.
---------------------------------
> DRM row sampling api
> --------------------
>
> Key: MAHOUT-1722
> URL: https://issues.apache.org/jira/browse/MAHOUT-1722
> Project: Mahout
> Issue Type: Improvement
> Reporter: Dmitriy Lyubimov
> Assignee: Dmitriy Lyubimov
> Fix For: 0.10.2
>
>
> We will ask engines to support two tiny apis for row vector sampling.
> One api is uniform multivariate hypergeometric (k parameter is given), and
> another is by fraction (simple map-only probabilistic filter). Spark
> implementation is enclosed (Spark just has an api for both, albeit k-sampler
> does not have strict mathematical guarantee of the distribution, and is only
> for small k).
> challenge here is that returned rows should be ordinally renumbered.
> (maybe i need to revisit this issue later, this was a pretty hasty API
> change, might be less than ideal in general case).
> PR https://github.com/apache/mahout/pull/135
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)