Re: [math] Design review pre 1.0

Mark R. Diggory Tue, 25 May 2004 10:17:34 -0700

I want to review this list and add in my comments...

Stephen Colebourne wrote:

I have performed a lightweight review of [math] from a code/style POV (not
technically as I'm not mathematical...)

1) Remember your public API that you must maintain includes both public and
protected classes/methods/fields. Are you confident that every method/field
(especially fields) are correctly and suitably named for you to want to
keep?

Yes, this could stand a code review ( 1 hour with report to list? )

2) Wherever there is implements Serializable you should have a
serialVersionUID static field. And are you confident that the
implementations are well enough established to allow for serialization? What
about transient fields? Serialization is definitely not about adding the
implements clause to everything.

See serialization thread concerning resolving any stray serialization issues.

3) Javadocs are sometimes thin. On occasions, they are written as paragraphs
visually but without the HTML <p> tag. (eg. UnivariateRealSolver) or
missing, eg StandardDeviation

Yes, it would be good to maintain acceptable html in javadoc. Yet, I'd like to point out that javadoc isn't java code. while we would like to maintain lots of it to help our users understand it, the library works just fine without it.

4) You might want to consider separating interfaces from implementations.
Perhaps all interfaces in the base package, or impl subpackages. Then again
the current layout may be best.

I'm not convinced we are at the point where this is possible. Though its open to discussion. Some areas this may be easier than others (distributions, analysis). Some packages are rather small and sparse, this would make them even sparser (Complex). I've put considerable effort into separating the the UnivariateStatistics interfaces from the actuall implementations. Just as a side note, at this time separation into implementations and interfaces should not mean two different jars as it is general Commons practice to try to maintain a project as one distributable jar.

5) Is double suitable for these calculations? Should the strictfp flag be
used? (I have no idea as to the answer, but I have to ask)

Neither do I. Can anyone enlighten us?

6) ComplexFormat doesn't extend the JDK Format class

7) ComplexMath is a utils class, which would be named ComplexMathUtils in
lang or collections (although the JDK Math class sets an alternate
precedent) The constructor is private, which I would recommend is changed to
public (aids tools like Velocity). Also Beta/Erf/Gamma.

And throughout the Math package we use this naming "StatUtils", "MathUtils". Maybe it should just be "ComplexUtils".

I'm unsure about instantiating StaticUtils, does velocity actually call staic methods on instantiated objects? Doesn't this seem a bad behavior to promote?

8) @author tags are often missing

Development Team Decision, we do not want @author tags, developers and contributors are acnowledged in Maven project.xml and site docs.

9) Factorys use the method newInstance() to obtain an instance -
getInstance() seems more typical of commons, and allows you to return a
cached value.

There is a symantic difference here:

"getInstance" is ideal for singleton instances

"newInstance" is ideal for instantiation of new instances (See w3c DOM, SAX)

Somewhere along the line we introduced "createFooBar", for cases where a Factory creates multiple types of objects other than itself.

I can accept these naming conventions. They are not uncommon.

10) SummaryStatistics/DescriptiveStatistics are factories but aren't named
as such. They also use Class.forName() which is dubious in a class loader
environment.

True, its a toss up, there are alot of service discovery approaches. do we want to be dependent on the Discovery API, or just use the Java services mechanism and jar each implementation such that it can be "discovered"?

11) Ensure there are no TODO comments in the Javadoc (OK in code) For
example GammaDistributionImpl

12) Should EmpiricalDistribution really have methods taking File and String
filePath - is File not enough? The implementation actually seems to do
something different (beware file encodings)

I would argue not that alternate method signatures are unneccessary, they are assistive and offer options for coding, but that the methods should be organized to use only one implementation of a loading strategy. ie

public void load(URl url);

public void load(File file);

public void load(InputStream);

here InputStream is the method that actually implements the loading behavior and the other two signatures simply aquire an InputStream and pass it to the InputStream signature. Unfortunately, the way the implementation is designed such that it reads each url/file twice makes doing this not possible.

13) Ensure every class has a constructor - never rely on the auto generated
one, for example BivariateRegression

Good Point

14) Some import statements are rather screwed visually, eg Product

These are formated using Eclipse configured to meet Sun formating standards. I assume this formating is caused when the package name gets long. I've maintained it for consistency.

15) Dependencies!!!!
commons-discovery can and should be an optional dependency. In fact it may
already be, as each call to DiscoveryClass is in an Exception catch, which
ought to trap ClassNotFoundException. However, I don't think that the
default implementation behaviour is actually working which needs fixing.

We need a good, standard discover mechanism. it sucks to have to copy and reimplement when other commons developers are already creating such excellent code. I know we want to keep dependencies to a minimum, but code reuse is also important, then point of Commons is to reduce code replication.

commons-lang is only used for NestableException. This is a dubious
dependency - the relevant code can be copied, and [lang]s ExceptionUtils is
perfectly able to handle extracting cause parameters by reflection. (Don't
copy NestableException - copy the code within)

I actually agree here because this class was created just to provide Nestability to older JVM's. Its really just a "patch" until time renders those JVM's obsolete. The code would eventually be removed from the Math API.

commons-logging is not used directly, just via beanutils and discovery (and
is a classic example of commons chained dependencies that should be avoided)

Yes, I had worked to make it a test and compile time dependency only.

commons-beanutils doesn't seem to be used, so shouldn't be in project xml

Its a test, compile time dependency for BeanListUnivariate, not a runtime dependency.

commons-collections TreeBag is used by Frequency - it might be worth
considering removing this dependency, but it is the hardest.
(CollectionUtils.NATURAL_COMPARATOR is a simple class to copy)

I'm not adverse to maintaing it, this is a Complex Datastructure, I'd rather reley on those who thought it out supporting it, Math shouldn't be about Collections.

Anyway, I hope this focuses attention on what needs doing (boring-wise)
before 1.0.

Stephen

I'd like us not to get too bound up in the dependency issue. If they can esily be removed, great, if they are to access complex datstructures and behavior, lets let those experts maintain such tools and we promote thier reuse. I would really classify Math as a secondary level Commons package. It does some complex things for which the Collections dependencies can be deemed acceptable and appropriate.

-Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [math] Design review pre 1.0

Reply via email to