[Replying to the ML.] Le mar. 12 mars 2019 à 13:00, Alex Herbert <[email protected]> a écrit : > > > On 11/03/2019 23:54, Gilles Sadowski wrote: > > Le ven. 8 mars 2019 à 18:12, Alex Herbert <[email protected]> a > > écrit : > >> Hi, > >> > >> I'd like to start on improving the methods in rng-simple to create the > >> RandomSource. > >> > >> There are a few PRs outstanding but I think the discussion on this list > >> was that the new XoShiRo generators are OK. So I would like to merge the > >> following PRs: > >> > >> RNG-82: Add XorShift1024StarPhi generator > > See other thread. > > > >> RNG-70: Add new XoShiRo generators > > +1 > > Ah. Most of these generators have more than one constructor (see below > on the discussion for the change to the SplitMix64 generator). These are > utility constructors given the small size of the seed. e.g. for a seed > of length 4: > > public XoShiRo256StarStar(long[] seed) > > public XoShiRo256StarStar(long seed0, long seed1, long seed2, long seed3) > > I did not see this as an issue. I added unit tests to show that the same > input data in either the array or the primitives produces the same > generator sequence. The primitive constructor does not have to do array > checking and filling when the array is not the correct length. The input > is directly used to set the state. It is a cleaner constructor, but it > does not fit the model required by the ProviderBuilder which would like > an object seed. > > I am going to rewrite the ProviderBuilder internals to allow the size of > the seed to be known and to remove the use of reflection to create a > source. In this instance the primitive constructor would be better. That > is a motivating reason for the change to the SplitMix64 too.
I did not look again at the code yet; but if you can work around the issue (as I recalled it) without impacting the public API, then fine I guess. :-) > > > > >> Then use master to set a baseline for the construction speed. This can > >> be used to compare against the latest code to see how much the changes > >> have improved construction speed. Given the work will have a big effect > >> on generators with small state it would be good to have the new > >> generators in the baseline. > > IIUC, "big effect" only when constructing and throwing quickly very > > many objects. > > The doc should make it clear that it's not the "nominal" use-case. > > Yes. There is not a document for this at the moment anyway (i.e. the > results of the construction benchmark). Should a section be added to the > user guide? Benchmarks can certainly go in the "Performance" section. > This can outline that a generator should be picked for its > intended purpose: output datatype required (int, double, etc) and number > of samples (period). For short lived generators with low samples then > construction speed is a factor. All good as long as the entry point for application developers to create RNGs is "RandomSource". > > This is a feature that will probably most benefit the > ThreadLocalRandomSource where short-lifetime generators are likely to be > required in a multi-threaded environment. ??? IIUC, "ThreadLocalRandomSource" will store the longest-living RNG instances! > > However the benchmark shows that once you are generating 1,000,000 > samples then the construction cost of even the TwoCmres is much less > than the sampling time. > > > > >> I note that the only tags in git are for releases. So the baseline can > >> be just a commit Id and I'll add it to the benchmark Jira RNG-72 for > >> reference. I'm still wondering about applications that require short-lived generators... It doesn't seem to ever be a good idea to do that. There was an issue caused by that in [Digester], fixed only in the latest release... > >> > >> There are also some other PRs to be discussed: > >> > >> RNG-81: NumberFactory to sample all rationals between 0 and 1. > >> > >> This one changes to the implementation in Open JDK for float and double > >> generation. I think this is OK to merge. > > +1 > > [No promise of reproducibility for "derived" types.] > OK. I'll put that in. > > > >> It won't effect the baseline > >> but is good to get it into the next release. > >> > >> > >> RNG-78: Added a ThreadLocalRandomSource. > >> > >> This works as a proof of concept. But it is all new files and so I'm > >> happy to leave that until it is decided exactly if and where to put it. > > Would there be a better place than in package "o.a.c.rng.simple"? > It was either in simple or examples. But I would like to see it as a > feature in use so I would leave it in simple Sure, it's a useful feature to help users avoid bad usage. > >> RNG-76: Added primitive long constructor to SplitMix64 > >> > >> This is something that will effect the benchmark so I'll merge it after. > >> It is a trivial change with a noticeable performance gain by avoiding > >> auto-boxing of the long seed. > > IIRC, there is some developer doc that refer to having a single > > constructor. But I do not recall the reason. > > I did not know. I cannot find that explained in the project docs. I > looked in site/xdoc/developers.xml and a poor regex search through the rest. I think I meant this[1]: The "provider" must specify one way for setting the seed. For a given seed, the generated sequence must always be the same. > Maybe the reason was to prevent ambiguous construction, e.g. with a > non-parameterised constructor. > > An exception was made for TwoCmres since it needs more arguments. That > is understandable. > > Could this rule be relaxed if the same data passed to different > constructors results in the same generated sequence? I guess so. IIRC, the limitation was due to using commons code in order to instantiate the appropriate RNG (core) class. > E.g. the unit tests > I added for the new XoShiRo generators test this. Thanks for all this work, Gilles [1] http://commons.apache.org/proper/commons-rng/commons-rng-core/apidocs/org/apache/commons/rng/core/source32/package-summary.html --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
