I want to review this list and add in my comments...
Stephen Colebourne wrote:
I have performed a lightweight review of [math] from a code/style POV (not technically as I'm not mathematical...)
1) Remember your public API that you must maintain includes both public and protected classes/methods/fields. Are you confident that every method/field (especially fields) are correctly and suitably named for you to want to keep?
Yes, this could stand a code review ( 1 hour with report to list? )
See serialization thread concerning resolving any stray serialization issues.2) Wherever there is implements Serializable you should have a serialVersionUID static field. And are you confident that the implementations are well enough established to allow for serialization? What about transient fields? Serialization is definitely not about adding the implements clause to everything.
Yes, it would be good to maintain acceptable html in javadoc. Yet, I'd like to point out that javadoc isn't java code. while we would like to maintain lots of it to help our users understand it, the library works just fine without it.3) Javadocs are sometimes thin. On occasions, they are written as paragraphs visually but without the HTML <p> tag. (eg. UnivariateRealSolver) or missing, eg StandardDeviation
I'm not convinced we are at the point where this is possible. Though its open to discussion. Some areas this may be easier than others (distributions, analysis). Some packages are rather small and sparse, this would make them even sparser (Complex). I've put considerable effort into separating the the UnivariateStatistics interfaces from the actuall implementations. Just as a side note, at this time separation into implementations and interfaces should not mean two different jars as it is general Commons practice to try to maintain a project as one distributable jar.4) You might want to consider separating interfaces from implementations. Perhaps all interfaces in the base package, or impl subpackages. Then again the current layout may be best.
5) Is double suitable for these calculations? Should the strictfp flag be used? (I have no idea as to the answer, but I have to ask)
Neither do I. Can anyone enlighten us?
And throughout the Math package we use this naming "StatUtils", "MathUtils". Maybe it should just be "ComplexUtils".6) ComplexFormat doesn't extend the JDK Format class
7) ComplexMath is a utils class, which would be named ComplexMathUtils in lang or collections (although the JDK Math class sets an alternate precedent) The constructor is private, which I would recommend is changed to public (aids tools like Velocity). Also Beta/Erf/Gamma.
I'm unsure about instantiating StaticUtils, does velocity actually call staic methods on instantiated objects? Doesn't this seem a bad behavior to promote?
Development Team Decision, we do not want @author tags, developers and contributors are acnowledged in Maven project.xml and site docs.8) @author tags are often missing
9) Factorys use the method newInstance() to obtain an instance - getInstance() seems more typical of commons, and allows you to return a cached value.
There is a symantic difference here:
"getInstance" is ideal for singleton instances
"newInstance" is ideal for instantiation of new instances (See w3c DOM, SAX)
Somewhere along the line we introduced "createFooBar", for cases where a Factory creates multiple types of objects other than itself.
I can accept these naming conventions. They are not uncommon.
True, its a toss up, there are alot of service discovery approaches. do we want to be dependent on the Discovery API, or just use the Java services mechanism and jar each implementation such that it can be "discovered"?10) SummaryStatistics/DescriptiveStatistics are factories but aren't named as such. They also use Class.forName() which is dubious in a class loader environment.
I would argue not that alternate method signatures are unneccessary, they are assistive and offer options for coding, but that the methods should be organized to use only one implementation of a loading strategy. ie11) Ensure there are no TODO comments in the Javadoc (OK in code) For example GammaDistributionImpl
12) Should EmpiricalDistribution really have methods taking File and String filePath - is File not enough? The implementation actually seems to do something different (beware file encodings)
public void load(URl url);
public void load(File file);
public void load(InputStream);
here InputStream is the method that actually implements the loading behavior and the other two signatures simply aquire an InputStream and pass it to the InputStream signature. Unfortunately, the way the implementation is designed such that it reads each url/file twice makes doing this not possible.
13) Ensure every class has a constructor - never rely on the auto generated one, for example BivariateRegression
Good Point
These are formated using Eclipse configured to meet Sun formating standards. I assume this formating is caused when the package name gets long. I've maintained it for consistency.14) Some import statements are rather screwed visually, eg Product
We need a good, standard discover mechanism. it sucks to have to copy and reimplement when other commons developers are already creating such excellent code. I know we want to keep dependencies to a minimum, but code reuse is also important, then point of Commons is to reduce code replication.15) Dependencies!!!! commons-discovery can and should be an optional dependency. In fact it may already be, as each call to DiscoveryClass is in an Exception catch, which ought to trap ClassNotFoundException. However, I don't think that the default implementation behaviour is actually working which needs fixing.
I actually agree here because this class was created just to provide Nestability to older JVM's. Its really just a "patch" until time renders those JVM's obsolete. The code would eventually be removed from the Math API.commons-lang is only used for NestableException. This is a dubious dependency - the relevant code can be copied, and [lang]s ExceptionUtils is perfectly able to handle extracting cause parameters by reflection. (Don't copy NestableException - copy the code within)
commons-logging is not used directly, just via beanutils and discovery (and is a classic example of commons chained dependencies that should be avoided)
Yes, I had worked to make it a test and compile time dependency only.
Its a test, compile time dependency for BeanListUnivariate, not a runtime dependency.commons-beanutils doesn't seem to be used, so shouldn't be in project xml
I'm not adverse to maintaing it, this is a Complex Datastructure, I'd rather reley on those who thought it out supporting it, Math shouldn't be about Collections.commons-collections TreeBag is used by Frequency - it might be worth considering removing this dependency, but it is the hardest. (CollectionUtils.NATURAL_COMPARATOR is a simple class to copy)
I'd like us not to get too bound up in the dependency issue. If they can esily be removed, great, if they are to access complex datstructures and behavior, lets let those experts maintain such tools and we promote thier reuse. I would really classify Math as a secondary level Commons package. It does some complex things for which the Collections dependencies can be deemed acceptable and appropriate.Anyway, I hope this focuses attention on what needs doing (boring-wise) before 1.0.
Stephen
-Mark
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
