[ 
https://issues.apache.org/jira/browse/MAHOUT-1440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Schelter resolved MAHOUT-1440.
----------------------------------------

    Resolution: Fixed
      Assignee: Sebastian Schelter

patch commited with some cosmetic changes, thanks for the contribution

> Add option to set the RNG seed for inital cluster generation in Kmeans/fKmeans
> ------------------------------------------------------------------------------
>
>                 Key: MAHOUT-1440
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1440
>             Project: Mahout
>          Issue Type: Improvement
>          Components: CLI, Clustering
>    Affects Versions: 1.0
>            Reporter: Andrew Palumbo
>            Assignee: Sebastian Schelter
>            Priority: Minor
>              Labels: reproducibility
>             Fix For: 1.0
>
>         Attachments: MAHOUT-1440.patch
>
>
> It was noted recently that there should be a way to set a static seed for the 
> the initial clusters of Kmeans. In the interests of reproducibility and 
> benchmarking, this patch adds an option to set the seed in the RNG used in 
> the RandomSeedGenerator.buildRandom() method called from the KmeansDriver and 
> FuzzyKMeansDriver.  
> I've added in a CLI option -setRandomSeed that when set to the same value 
> (with the -k option set) will produce reproducible results from kmeans and 
> fkmeans.
> This patch allows the user to set a value.  It may make more sense to just 
> have an option to set a flag to use the STANDARD_SEED from RandomWrapper.
> I am still feeling my way around the codebase so if this will be useful and 
> there need to be any changes let me know.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to