Re: [math] RandomDataImpl refactoring

Gilles Sadowski Sun, 06 Nov 2011 09:21:29 -0800

On Sun, Nov 06, 2011 at 09:27:44AM -0700, Phil Steitz wrote:
> On 11/6/11 5:44 AM, Gilles Sadowski wrote:
> > On Sat, Nov 05, 2011 at 11:03:40PM -0700, Phil Steitz wrote:
> >> On 11/5/11 6:38 PM, Gilles Sadowski wrote:
> >>> On Sat, Nov 05, 2011 at 08:47:18AM -0700, Phil Steitz wrote:
> >>>> On 11/5/11 7:04 AM, Luc Maisonobe wrote:
> >>>>> Le 05/11/2011 08:29, Phil Steitz a écrit :
> >>>>>> The comments in MATH-701 included a couple of suggestions for
> >>>>>> refactoring RandomDataImpl.
> >>>>>>
> >>>>>> 1) Eliminate the lazy initialization of the non-secure and secure
> >>>>>> generators.  Have the constructor initialize the generators
> >>>>>> instead.  I am fine with this for the non-secure generator, but
> >>>>>> initializing a secure random generator can be quite slow, so that is
> >>>>>> not a good idea.
> >>>>> I agree.
> >>>>>
> >>>>>> 2) Split out the secure stuff into a separate class, possibly a
> >>>>>> subclass.  I am ambivalent on this one, as I see RandomDataImpl as a
> >>>>>> utility class and it is convenient to have the most commonly used
> >>>>>> data generation utilities bundled together.  The "secure" methods
> >>>>>> only generate hex strings, ints and longs.  I have never had the
> >>>>>> need or heard of the need for "secure" gaussians or the other
> >>>>>> non-secure deviates.  Has anyone else?  Any comments either way on 
> >>>>>> this?
> >>>>> The only needs for secure random generators I know of is for creating
> >>>>> crypto keys. I feal this is out of scope of our library. For sure,
> >>>>> crypto is a mathematical thing, but we don't provide anything except
> >>>>> these random generation, which in fact does not do anything fancy but
> >>>>> only requires an underlying secure generator.
> >>>>>
> >>>>> So I would be happy to completely remove this stuff.
> >>> I agree that this seems like a utility that does not nicely fit into 
> >>> Commons
> >>> Math.
> >>>
> >>>> What I have used these things for is generating session IDs and
> >>>> integer object IDs that I did not want to be easily predictable.  I
> >>>> would prefer to keep the methods that are there.
> >>> IIUC this example, it indeed has nothing to do with a "mathematical"
> >>> application; the feature thus rather belongs in a toolkit for handling
> >>> collections of things (like unique IDs).
> >> [math] is used by lots of different kinds of applications.  Some in
> >> physical science, some in finance, some web applications.  The
> >> random data generation methods are, like the optimization and other
> >> features in [math], broadly useful. 
> > The RNG feature is much more broadly useful (in a scientific context) than
> > the secure RNG feature.
> > CM is not going to provide e.g. web services boiler-plate code just because
> > web services could make use of math algorithms; that's backward reasoning.
> >
> > The "...Secure..." methods exist. I don't say "Drop them. Period." Just make
> > it clear that thare are not on the same footing as all the other "next..."
> > methods.
> >
> >>> In "RandomDataImpl", what differentiates "secure" from "non-secure" RNGs 
> >>> is
> >>> that for the latter, one can use one of the many _internal_ 
> >>> implementations
> >>> of a RNG, while for the former one can only use external ones.
> >> I don't get this.  It is useful to be able to generate random longs
> >> and hex strings that are not easily predictable.  That is what the
> >> "nextSecureXxx" methods are for.  Again, what users do with these
> >> capabilities is up to them.  I personally use them in applications
> >> and see no reason to drop them.
> > Not everything that is useful must be provided by Commons Math.
> > The above was stressing that CM provides many implementations of a RNG, but
> > none of them are secure.
> > The "secure" methods in "RandomDataImpl" seems to be just syntactic sugar to
> > call the standard "SecureRandom" class. I think that this difference in
> > status should be reflected somehow, i.e. by moving the maethods in another
> > class.
> >
> >>> As I mentioned on the JIRA page, not all the "next..." functions can be
> >>> "secure". This is inconsistent with respect to the "SecureRandom" class
> >>> inheriting from "Random" (thus allowing to call any "next..." methods,
> >>> albeit benefiting from the "security").
> >> So what?  The useful ones are ints, longs and hex strings.
> > I see no reason to forbid something that comes for free by inheritance!
> >
> >>> Finally, the additional layer imposed by the "security" management would
> >>> seem to also point to having this functionality in another class.
> >>>
> >>> I'd also suggest that the "RandomData" interface be merged with its unique
> >>> implementation (a.k.a. "RandomDataImpl").
> >> +1 - I was planning to suggest this.
> >>> If keeping the "secure" utilities, they would go in a new
> >>> "SecureRandomData", which can contain only some of the "next..." as you
> >>> suggest that not all of them are really needed or useful (in which case I
> >>> wonder why the standard "SecureRandom" inherits from "Random"...).
> >> I see no reason to pull out 3 methods into a new class.
> > I gave the reasons here and in the other post.
> > There is no minimum-number-of-methods rule for class creation.
> 
> Sorry, I am still -1 for this change, for the reasons stated.  The
> "secure" methods have been then since 1.0 and are useful.  The
> inheritance approach is overly complicated and unnecessary.


Overly complicated? Certainly not:
---
  SecureRandomData s = new SecureRandomData();
---
Unless you count the additional typing of the "Secure" string as
"complicated"!

Polymorphism will make some things even simpler; and you can then add more
of the specific "secure" things (like "Provider"), all of those
unnecessarily pollute a "RandomData" class that doesn't need to be aware of
cryptographic constraints.

>  Unless
> some users chime in with the view that the class will be easier to
> use split out, there is no reason to change this.

1. Luc "was happy removing all the stuff".
2. Sebb was inclined to make the field final.
3. I agree with both.
That's three to one, if I count correctly.

I don't have a big problem with keeping the functionality, if you insist on
that point.
I just suggested that it should stand out for what it is: syntactic sugar.
IMHO, this alone makes the thing easier to use because you don't have to
have such a long explanation as currently exists in "RandomDataImpl" to make
the distinction between secure and non-secure.
With the refactoring, you can just state: "Application that require secure
RNG should use the {@link SecureRandomData} class". And in that class, you
give example of situations where the stuff could be useful (like session
IDs) without polluting a class like "RandomData".
I must add that you are yourself making a point that the utilities are
different, when stating that "secure" RNGs are "slow" to initialize.
And there are other properties (non-deterministic) that might make them
unsuitable for scientific usage.
Separation would go a good deal towards self-documenting code, thus _better_
documented code. [Say, people who wouldn't know CM, would readily spot a
class named "SecureRandomData". Being in line with the design of the Java
standard hierarchy for this stuff makes it also easier.]


Gilles

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [math] RandomDataImpl refactoring

Reply via email to