[
https://issues.apache.org/jira/browse/MAHOUT-687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13029989#comment-13029989
]
Ted Dunning commented on MAHOUT-687:
------------------------------------
{quote}
A deterministic random matrix or vector needs to set the seed for each
multiply. This fix would create too much garbage. (Each MersenneTwister has
2500 bytes!) Once you say you need Commons MersenneTwister instead, because it
has a setSeed(long), the rest of the patch ticks over.
{quote}
MersenneTwister is unacceptable for this usage anyway. It takes far too much
startup time. The commons implementation just uses the long with a weak
generator to build the long seed so there isn't a difference in garbage
created. Besides, this is totally ephemeral garbage that won't even survive
out of newspace.
A good implementation option is Murmurhash applied to row and column and salt.
> Random generator objects- slight refactor
> -----------------------------------------
>
> Key: MAHOUT-687
> URL: https://issues.apache.org/jira/browse/MAHOUT-687
> Project: Mahout
> Issue Type: Improvement
> Reporter: Lance Norskog
> Priority: Minor
> Attachments: MAHOUT-687.patch
>
>
> Problems:
> * Uncommons MersenneTwisterRNG, the default RandomUtils.getRandom(), ignores
> setSeed without throwing an error.
> * The project wants to move off Uncommons anyway.
> This patch uses the org.apache.commons.math.random.RandomGenerator classes
> instead of org.apache.uncommons.maths.RepeatableRNG classes.
> Testcases: All math test cases pass except for
> org.apache.mahout.math.stats.LogLikelihoodTest.
> Other package tests fail that are mostly about testing random-oriented
> classes; not a surprise.
> Almost all tests that use random numbers in algorithms still pass; this is a
> good sign of their stability.
> .
> Still, a lot of tests have to be fiddled to make this commit.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira