Re: [MATH][GENETICS] Review of PR #197

Gilles Sadowski Fri, 29 Oct 2021 18:41:22 -0700

Le ven. 29 oct. 2021 à 17:00, Avijit Basak <avijit.ba...@gmail.com> a écrit :
>
> Hi All
>
>         I have fixed most of the review comments. The changes have been
> committed to PR#199.
>
> (A)
> Please "rebase" on "master".
> Please "squash" intermediate commits: For a new feature, a single commit
> should exist (that corresponds to the JIRA report describing it).
> --Will be done once all changes are finalized and committed.


What is the rationale for not doing it right now?
The PR should always be "rebased" on the latest "master".

>
> (B)
> I'm confused by your defining "legacy" packages in new modules...
> What kind of comparisons are you considering?
> It is fine to depend (with scope "test") on CM v3.6.1 to perhaps (?)
> regression testing; but please note that when your proposal is
> merged, it will imply that the "legacy" codes *must* be removed.
> [We don't want to keep duplicate functionality.]
> -- The new implementation has improved the quality of optimization over the
> legacy model.

"Improved" in what sense?
If you mean enhanced performance, such checks should be done
using JMH (producing data to be published on the web site).

> I have added the legacy packages to demonstrate the same.
> Once we remove the genetics packages in the legacy module, the same will be
> deleted from examples.

I'm probably missing what exactly those "legacy" examples aim to
demonstrate...
In passing, what's the purpose of
  Thread.sleep(5000)
(at line 55 in file "TSPOptimizerLegacy")?

>
> (C)
> File
> "commons-math-examples/examples-ga/src/main/resources/spotbugs/spotbugs-exclude-filter.xml"
> does not belong there.
> --Removed
>
> (D) General design
> Class "ConsoleLogger"
> -> We should not reinvent the wheel.  We should consider whether logging
> is necessary, and in the affirmative, depend on the de facto standard:
> "slf4j".
> -- Introduced the slf4j with simple implementation and removed the
> consolelogger class.

Could you please post a separate message highlighting the
new dependency?
Also (if no objections are made):
1. the declaration should go in the main POM (in the <dependencyManagement>
    section).
2. the library (CM) must not depend on a specific logger implementation.

>
> Class "Constants"
> -> Any data should be declared where its purpose is obvious.
> --Put the constants in relevant classes.
>
> * Class "RandomGenerator"
> 1. Duplicates functionality (storage of thread-local instances)
> readily available in "Commons RNG".
> 2. (IMHO) Thread-local instances should not be used for "heavy" usage
> (like in GA).
> --Modified

You still use "ThreadLocalRandomSource".
I think that if some functionality requires a RNG instance it should be
passed to its constructor (or created by it).

I noticed that the interface "ChromosomeRepresentationUtils"
is actually a utility class (as the name indicates): It only contains
"static" functions.  This will probably require refactoring (utility
classes are frowned upon nowadays).
Some functionality in that class (e.g. sampling from a range of
integers) exists in the "sampling" module of "Commons RNG".

>
> * Class "ValidationUtils"
> -> should not be public (or should be defined in an "internal" package).
> --Changed

The class actually provides no added value.

> >
> > * Class "AbstractListChromosome" (and subclasses)
> > Didn't we conclude that this was a very wasteful implementation of the
> > "chromosome" concept?
> > -- I have some concerns regarding this. I am not much aware of any
> > discussion regarding this conclusion.
>
> >Please search the ML archive; I seem to recall a detailed discussion
> >where Alex gave hints on how a binary chromosome should be
> >implemented.
> --There is a separate Jira for the same. I shall do the implementation.
>
> > Chromosomes are always conceptualized as collections of allele/genes. So
> We
> > need a collection of the *genotypes* anyway. Here List has been used as a
> > collection.
> > We need an abstraction for representing the collection of Genotype. All
> > crossover and mutation operators are based on this abstraction. This
> > enabled reuse of crossover and mutation operators for all chromosome types
> > which extend the abstraction. I am not sure how to achieve this
> reusability
> > without an abstraction.

The abstraction is quite useful, but the point is indeed that the abstraction
is somewhat broken by exposing the List<T> as the representation.
I understand that the advantage is to be able to define several functionalities
(crossover, ...) in a generic way; but is it worth the obvious drawback is that
object instances must be used to represent genes?

> > Any domain specific new chromosome implementation extending the
> > AbstractListChromosome class can reuse all crossover and mutation
> operators.
> > For our proposed improvement of BinaryChromosome we should be able to
> > extend the AbstractChromosome (*not* AbstractListChromosome) for the new
> > class and provide the dedicated crossover and mutation operators for the
> > corresponding Genotype. Without an *explicit* abstraction, management of
> > crossover and mutation operators would be difficult.

For sure, the design would be different. But it is not obvious to me that
the current one is necessarily better.

---CUT---
public interface CrossoverPolicy<P> {
   ChromosomePair<P> crossover(Chromosome<P> first, Chromosome<P>
second, double crossoverRate);
}
---CUT---
vs
---CUT---
public interface Chromosome {
   Chromosome crossover(Chromosome other, double crossoverRate);
}
---CUT---

> > Please share further thoughts regarding this.
>
> Whether we should have a data-based (subclass of "List<T>") or
> behaviour-based  access to individual genes is a fundamental design
> decision.
> The "legacy" implementation chose the former but it might be worth
> considering an alternative (lest we'll need to support several APIs if
> we later come to the conclusion that we need a more compact
> representation for some use cases.
> --How are you thinking the design to be behaviour-based. It will be helpful
> if you share some examples.
> The legacy was only based on data types. But the current model introduces
> the concept of phenotype along with genotype. The genotype  (List<T>)
> represents the actual chromosome implementation and the phenotype (<P>)
> represents the problem domain. A Decoder is also introduced to convert
> genotype to phenotype. In this way this design would be capable of using a
> single genotype for a wide variety of problem domains(phenotype) including
> both direct and indirect encoding. Do you see any challenge to this current
> implementation?
> Kindly share your thoughts.

I admit that I did not fully think about it (but I thought that we had
agreed about the List<T> being a problem)...

>
> * Package "convergencecond"
> -> Should probably be renamed "convergence".
> --Done
>
> * Class "GeneticException"
> 1. Should not be public (or should be defined in an "internal" package"[1]).
> 2. If various types (that map to different JDK subclasses of
> RuntimeException,
> e.g. "IllegalArgumentException" vs "NullPointerException") are necessary,
> there could be a factory for creating the appropriate instance.
> However, for "null" checks, please use the JDK utilities[2].
> --Moved to an internal package. Null checks have been modified too.
>
> >
> > * Class "ConvergenceListenerRegistry"
> > Shouldn't it be thread-safe?
> > -- Yes. We need this to be thread-safe for parallel multi-population
> > parallel genetic algorithms.
> --No change for the time being.
>
> (E) Unit tests
> * src/test
> 1. New tests should use Junit 5.
> 2. "Example usages" probably belong in the "examples-ga" module.
> --JUnit version is inherited from commons-math module.

Not sure what you mean.
It's possible to use Junit5 in new modules/test classes even if
other classes use older Junit.

> --I could not understand what is meant by "Example usages" here. Which
> component is being referred to here.

I'm referring to a comment such as
---CUT---
        // to test a stochastic algorithm is hard, so this will rather
be an usage
        // example
---CUT---
(at line 72 in "GeneticAlgorithmTestBinary.java").

>
> (F) Code readability
> * Please write one argument per line.
> * Write one condition check per line.
> * Avoid comments with no added value (like "constructor" for a constructor).
> * Avoid "ASCII art" (see e.g. "OnePointCrossover"); a link[2] is often
> preferable.
> * Do no duplicate documentation (see e.g. "OnePointCrossover").
> --I have formatted the method declaration to have one parameter in one line.
> --Most of the if conditions are having a single condition except very few
> pre existing ones. I could not see any way to format the if statement in
> eclipse like the suggestion. I cannot introduce any formatting rule which
> cannot be handled in eclipse as that will be very hard to manage.

?
[I can't imagine that Eclipse won't let you add a newline.]

> --ASCII art and other crossover classes are untouched for this release.

What do you mean by "this release"?
In this instance, it is easy to make the docs clearer by using a link
rather than ASCII figures.  If you want to argue that the latter should
be kept, please start a new thread in order to collect other opinions.

Regards,
Gilles

>
> (G)
> Some files contain "tab" characters (e.g. "pom.xml").
> --Removed tab characters.
>
> Thanks & Regards
> --Avijit Basak
>
>>> [...]

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [MATH][GENETICS] Review of PR #197

Reply via email to