Re: [RNG] About RNG-6 (Was: [VOTE][RC1] Release Commons Rng 1.0)

Emmanuel Bourg Fri, 16 Sep 2016 16:38:36 -0700

Le 15/09/2016 à 16:37, Gilles a écrit :

> I don't understand "motivations for the API".


I just mean a short introduction to the component, something explaining
why it exists, its usefulness (performance, randomness quality), the
implementations supported, etc.


> I won't do that.
> It was already a pain to create those tables as they are (see the
> doc source).

I understand you don't want to do it, editing apt files isn't fun. But
are you opposed to the idea?


>> - The internal packages should be hidden from the Javadoc,
> 
> I do not know how do that.

Packages can be excluded from the Javadoc by configuring the
maven-javadoc-plugin with the <excludePackageNames> element:

https://maven.apache.org/plugins/maven-javadoc-plugin/examples/exclude-package-names.html


> And I don't see the point, since from the "binary" POV, they
> are "public".

That's purely from the documentation point of view here. If we don't
want users to use these packages, let's not mention them in the Javadoc.
That's an extra security with the "internal" package name.


> Users can select a generator without knowing the gory details.
> If they want to know more, they go to the internals.
> It they really want to know, they go to the references.
> 
> Internal should not be hidden: contributors who'd fancy to provide
> additional generators do not have to go through any hoops.
> 
> Or do you suggest that this component should supply two JARs:
>   * commons-rng-api
>      Contents of "o.a.c.rng"
>   * commons-rng-core
>      Contents of "o.a.c.rng.internal"
> ?
> 
> And two sets of Javadoc?

No thanks, too complicated. Contributors will just checkout the code and
quickly figure out how it's organized and how it's meant to be extended,
I don't worry about that. But I worry about mere users getting confused
about the API.


>> - Why isn't the UniformRandomProvider interface simply named
>> RandomProvider?
> 
> Maybe because we want to be transparent on what the code does (?).
> 
> With this name, there is no ambiguity: no more, no less than what
> the name implies.

I don't feel there is an ambiguity here, probably because I'm not an
expert in random number generators. I just see an excessively long class
name that could be shortened. CM4/Hipparchus just use 'RandomGenerator'
for this interface, that's in line with the name of the component and a
better choice in my opinion.


>> - UniformRandomProvider mimics the java.util.Random methods which is
>> nice. Is there any value in also implementing the nextGaussian() method
>> such that our API is a perfect drop-in replacement for the JDK class?
> 
> No. [IMHO.]
> This was discussed on this list (and cf. above point).

That would be convenient for migrating code using java.util.Random to
commons-rng though. CM has it and the implementation doesn't look so
complicated, I'm not sure to understand why you think it should go
elsewhere. Even if more elaborated implementations could be latter
imagined there is no harm providing a default one as a convenience for
migrating legacy code.


>> - For code relying on the java.util.Random type, it might be useful to
>> provide an adapter turning an UniformRandomProvider into a
>> java.util.Random implementation (Hipparchus has one).
> 
> This was discussed on this list.
> 
> And see Commons Math's classes "o.a.c.math4.random.RandomUtils" and
> "o.a.c.math4.random.RngAdaptor" classes.
> 
> In the latter (being deprecated as of its creation), I indicated why
> it is a bad idea.
> 
> Again, if we really want to support the use-case you proposed, it's
> a utility that should go in another component.

The component already converts a java.util.Random into our
RandomProvider, I tend to think it's obvious to support the reverse
operation in the same component.


>> - The choice of an enum for the factory class RandomSource is a bit
>> unusual. It is probably ok for simple RNG with a simple seed, but with
>> more complex implementations I find it less intuitive to use.
> 
> Less intuitive than what?

Less intuitive than other designs, such as the one I proposed.


>> When a user types this in its IDE, there is no type information nor
>> contextual documentation for the parameters expected. The user has to
>> know the type of the seed and the meaning of the parameters expected,
>> the IDE won't be able to help here (with CTRL+P and CTRL+Q in IntelliJ).
> 
> Honestly, I don't care.
> To put it less bluntly, the choice of RNG to instantiate is not a
> matter of IDE.

I don't think you understand my point. I don't expect the IDE to pick
the right RNG for my use case of course, that's impossible. But I expect
the IDE to help me using the API by hinting the type expected and
providing contextual documentation about the method used. The current
design makes this impossible. The users are forced to open the Javadoc
in their browser when they want to instantiate a generator they don't
know or remember about. I don't think it's a good idea.


> Practically, the lack of type information (which I don't like either)
> was a trade-off to have a dead-simple API.

I'd say it's certainly a concise API, but its simplicity is debatable.


> This design came after 9 months of sustained work.
> Nothing to be proud of in terms of LOC/day...
> But what you propose to drop is the only non trivial part of
> this work.
> 
> As I said above, this is a trade-off, and it looks like a fault
> only because Java is strongly typed.
> 
> The automatic seed conversion is a huge advantage for casual
> users; they can still seed them with a single "long" (and under
> the hood get a good "native" seed).
> But the doc is there to warn them that RNGs usually have a larger
> state.

What's the use case for accepting a long[] if the API expects a int[]?
That's a bit odd.

At least, instead of using an Object in the create method, we could use
a dedicated Seed class containing factory methods and the conversion
logic. Its API could look like this:

  Seed.of(int...)
  Seed.of(long...)
  Seed.toInt()
  Seed.toLong()
  Seed.toIntArray()
  Seed.toLongArray()
  Seed.createIntArray(n)
  Seed.createLongArray(n)


> No, no, no.
> Avoiding the "Serializable" clutter was a _requirement_.
> 
> [...]
> 
> It's also a feature that this component will never have to deal
> with CVE related to exploitation of something "Serializable" that
> should not have been...

The random generators just hold a few primitive variables, there is no
risk here.


>> - Why is the byte[] state encapsulated into the RandomSource.State
>> class? Why not using a byte array directly?
> 
> Because encapsulation is the hallmark of good OO programming?

True but the RandomSource.State class isn't encapsulating the actual
state of the generators here. It encapsulates an image of the generator
state. Even if the content of the byte array was changed it would not
impact the state of the generator. This extra layer is really unnecessary.


> If some application only needs to save/restore without actual
> persistence, it can do without dealing with the "ugliness" of
> "byte[]".

byte[] is ugly???

> And it will continue to work even if the internal representation
> is changed.

Why would we change that?


> There were a lot of discussions here about what "beta release"
> means and how to implement it in practice (package naming,
> versioning scheme).
> 
> They never led anywhere.

Did you forget my suggestion to use a different package name and
artifactId for the beta releases?


> I seriously doubt that having code tagged "beta" would be more
> successful at getting users to provide feedback.

A beta release will get more exposure than a snapshot, because it'll be
announced on the mailing lists, on Twitter and available on Maven Central.

> It will more likely turn them away, thinking that the generators
> might be buggy.  Which is not the case.

Not if we explain that the code is inherited from CM and very mature but
moved to a new component with a reworked API.

>> but I'm -1 for releasing the current code as 1.0.
> 
> Any technical reasons?

Simply the reasons I mentioned that can't be addressed in a later
release without breaking the binary compatibility:
- UniformRandomProvider is too long, rename back to RandomGenerator
- Add nextGaussian() to the UniformRandomProvider interface
- RandomSource hides type information and parameters' semantic
- RandomSource.State isn't necessary

Emmanuel Bourg

(My apologies for the late feedback, I wish I had more time to do it
earlier)


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [RNG] About RNG-6 (Was: [VOTE][RC1] Release Commons Rng 1.0)

Reply via email to