Re: [MATH][GENETICS] Review of PR #197

Avijit Basak Fri, 29 Oct 2021 07:59:59 -0700

Hi All

        I have fixed most of the review comments. The changes have been
committed to PR#199.


(A)
Please "rebase" on "master".
Please "squash" intermediate commits: For a new feature, a single commit
should exist (that corresponds to the JIRA report describing it).
--Will be done once all changes are finalized and committed.

(B)
I'm confused by your defining "legacy" packages in new modules...
What kind of comparisons are you considering?
It is fine to depend (with scope "test") on CM v3.6.1 to perhaps (?)
regression testing; but please note that when your proposal is
merged, it will imply that the "legacy" codes *must* be removed.
[We don't want to keep duplicate functionality.]
-- The new implementation has improved the quality of optimization over the
legacy model. I have added the legacy packages to demonstrate the same.
Once we remove the genetics packages in the legacy module, the same will be
deleted from examples.

(C)
File
"commons-math-examples/examples-ga/src/main/resources/spotbugs/spotbugs-exclude-filter.xml"
does not belong there.
--Removed

(D) General design
Class "ConsoleLogger"
-> We should not reinvent the wheel.  We should consider whether logging
is necessary, and in the affirmative, depend on the de facto standard:
"slf4j".
-- Introduced the slf4j with simple implementation and removed the
consolelogger class.

Class "Constants"
-> Any data should be declared where its purpose is obvious.
--Put the constants in relevant classes.

* Class "RandomGenerator"
1. Duplicates functionality (storage of thread-local instances)
readily available in "Commons RNG".
2. (IMHO) Thread-local instances should not be used for "heavy" usage
(like in GA).
--Modified

* Class "ValidationUtils"
-> should not be public (or should be defined in an "internal" package).
--Changed

>
> * Class "AbstractListChromosome" (and subclasses)
> Didn't we conclude that this was a very wasteful implementation of the
> "chromosome" concept?
> -- I have some concerns regarding this. I am not much aware of any
> discussion regarding this conclusion.

>Please search the ML archive; I seem to recall a detailed discussion
>where Alex gave hints on how a binary chromosome should be
>implemented.
--There is a separate Jira for the same. I shall do the implementation.

> Chromosomes are always conceptualized as collections of allele/genes. So
We
> need a collection of the *genotypes* anyway. Here List has been used as a
> collection.
> We need an abstraction for representing the collection of Genotype. All
> crossover and mutation operators are based on this abstraction. This
> enabled reuse of crossover and mutation operators for all chromosome types
> which extend the abstraction. I am not sure how to achieve this
reusability
> without an abstraction.
> Any domain specific new chromosome implementation extending the
> AbstractListChromosome class can reuse all crossover and mutation
operators.
> For our proposed improvement of BinaryChromosome we should be able to
> extend the AbstractChromosome (*not* AbstractListChromosome) for the new
> class and provide the dedicated crossover and mutation operators for the
> corresponding Genotype. Without an *explicit* abstraction, management of
> crossover and mutation operators would be difficult.
> Please share further thoughts regarding this.

Whether we should have a data-based (subclass of "List<T>") or
behaviour-based  access to individual genes is a fundamental design
decision.
The "legacy" implementation chose the former but it might be worth
considering an alternative (lest we'll need to support several APIs if
we later come to the conclusion that we need a more compact
representation for some use cases.
--How are you thinking the design to be behaviour-based. It will be helpful
if you share some examples.
The legacy was only based on data types. But the current model introduces
the concept of phenotype along with genotype. The genotype  (List<T>)
represents the actual chromosome implementation and the phenotype (<P>)
represents the problem domain. A Decoder is also introduced to convert
genotype to phenotype. In this way this design would be capable of using a
single genotype for a wide variety of problem domains(phenotype) including
both direct and indirect encoding. Do you see any challenge to this current
implementation?
Kindly share your thoughts.

* Package "convergencecond"
-> Should probably be renamed "convergence".
--Done

* Class "GeneticException"
1. Should not be public (or should be defined in an "internal" package"[1]).
2. If various types (that map to different JDK subclasses of
RuntimeException,
e.g. "IllegalArgumentException" vs "NullPointerException") are necessary,
there could be a factory for creating the appropriate instance.
However, for "null" checks, please use the JDK utilities[2].
--Moved to an internal package. Null checks have been modified too.

>
> * Class "ConvergenceListenerRegistry"
> Shouldn't it be thread-safe?
> -- Yes. We need this to be thread-safe for parallel multi-population
> parallel genetic algorithms.
--No change for the time being.

(E) Unit tests
* src/test
1. New tests should use Junit 5.
2. "Example usages" probably belong in the "examples-ga" module.
--JUnit version is inherited from commons-math module.
--I could not understand what is meant by "Example usages" here. Which
component is being referred to here.


(F) Code readability
* Please write one argument per line.
* Write one condition check per line.
* Avoid comments with no added value (like "constructor" for a constructor).
* Avoid "ASCII art" (see e.g. "OnePointCrossover"); a link[2] is often
preferable.
* Do no duplicate documentation (see e.g. "OnePointCrossover").
--I have formatted the method declaration to have one parameter in one line.
--Most of the if conditions are having a single condition except very few
pre existing ones. I could not see any way to format the if statement in
eclipse like the suggestion. I cannot introduce any formatting rule which
cannot be handled in eclipse as that will be very hard to manage.
--ASCII art and other crossover classes are untouched for this release.

(G)
Some files contain "tab" characters (e.g. "pom.xml").
--Removed tab characters.

Thanks & Regards
--Avijit Basak

On Thu, 21 Oct 2021 at 21:32, Gilles Sadowski <gillese...@gmail.com> wrote:

> Le mer. 20 oct. 2021 à 08:47, Avijit Basak <avijit.ba...@gmail.com> a
> écrit :
> >
> > Hi
> >
> >          Thanks for the review comments. I have started making the
> changes.
> > However, I have some queries regarding some of comments as noted below:
>
> Some (partial) answers below.
>
> >
> > (B)
> > I'm confused by your defining "legacy" packages in new modules...
> > --This is kept for comparison purposes between the legacy and the new
> > implementation of GA.
>
> What kind of comparisons are you considering?
> It is fine to depend (with scope "test") on CM v3.6.1 to perhaps (?)
> regression testing; but please note that when your proposal is
> merged, it will imply that the "legacy" codes *must* be removed.
> [We don't want to keep duplicate functionality.]
>
> > (D) General design
> > Class "ConsoleLogger"
> > -> We should not reinvent the wheel.  We should consider whether logging
> > is necessary, and in the affirmative, depend on the de facto standard:
> > "slf4j".
> > -- I don't see any use of a logging framework in the math library.
>
> There is a long history of not wanting any kind of dependency.
> But this ship has sailed.
>
> > That is
> > the reason I introduced ConsoleLogger. If we introduce a logging
> framework
> > we won't need this class at all. I think we should include the logger in
> > the root(commons-math\) pom.xml file so that all modules should be able
> to
> > use this.
>
> Starting with the upcoming release, we can decide on a per-module
> basis.  Please make the case (in a new ML thread) for introducing
> such a dependency in the GA module.
>
> >
> > Class "Constants"
> > -> Any data should be declared where its purpose is obvious.
> > -- We can declare the constants where it belongs but this might introduce
> > duplicate constants across different classes and hence reduce
> reusability.
>
> The class does not mention where the data is used, nor why it is
> necessary that it be "public".
> By default, the leaner API (i.e. no unnecessary "public" components),
> the better (even if sometimes that would entail duplicating "private"
> data).
> [TBD on a case-by-case basis.]
>
> >
> > * Class "AbstractListChromosome" (and subclasses)
> > Didn't we conclude that this was a very wasteful implementation of the
> > "chromosome" concept?
> > -- I have some concerns regarding this. I am not much aware of any
> > discussion regarding this conclusion.
>
> Please search the ML archive; I seem to recall a detailed discussion
> where Alex gave hints on how a binary chromosome should be
> implemented.
>
> > Chromosomes are always conceptualized as collections of allele/genes. So
> We
> > need a collection of the *genotypes* anyway. Here List has been used as a
> > collection.
> > We need an abstraction for representing the collection of Genotype. All
> > crossover and mutation operators are based on this abstraction. This
> > enabled reuse of crossover and mutation operators for all chromosome
> types
> > which extend the abstraction. I am not sure how to achieve this
> reusability
> > without an abstraction.
> > Any domain specific new chromosome implementation extending the
> > AbstractListChromosome class can reuse all crossover and mutation
> operators.
> > For our proposed improvement of BinaryChromosome we should be able to
> > extend the AbstractChromosome (*not* AbstractListChromosome) for the new
> > class and provide the dedicated crossover and mutation operators for the
> > corresponding Genotype. Without an *explicit* abstraction, management of
> > crossover and mutation operators would be difficult.
> > Please share further thoughts regarding this.
>
> Whether we should have a data-based (subclass of "List<T>") or
> behaviour-based  access to individual genes is a fundamental design
> decision.
> The "legacy" implementation chose the former but it might be worth
> considering an alternative (lest we'll need to support several APIs if
> we later come to the conclusion that we need a more compact
> representation for some use cases.
>
> > * Class "GeneticException"
> > 1. Should not be public (or should be defined in an "internal"
> package"[1]).
> > 2. If various types (that map to different JDK subclasses of
> > RuntimeException,
> > e.g. "IllegalArgumentException" vs "NullPointerException") are necessary,
> > there could be a factory for creating the appropriate instance.
> > However, for "null" checks, please use the JDK utilities[2].
> > -- As of now we are managing all exception types by single
> GeneticException
> > class. So there is no factory.
>
> The issue is that a NPE should not be an IAE if we want to be consistent
> with the JDK's exceptions (as was decided some years ago).
> [The "legacy" CM exceptions to not follow this convention (for practical
> reasons in order to implement features that were never used), which is
> why they are "legacy"...]
>
> > -- Using JDK utilities for NullPointer would repeat this code in all
> > places. Is it fine?
> > Objects.requireNonNull(object,
> > Message.format(GeneticException.NULL_ARGUMENT, args));
>
> Calling "requireNonNull(T obj)" is self-documenting; there is no
> added value in a "null argument" message.
>
> >
> > * Class "ConvergenceListenerRegistry"
> > Shouldn't it be thread-safe?
> > -- Yes. We need this to be thread-safe for parallel multi-population
> > parallel genetic algorithms.
>
> In the current code, it is not (IIUC).
>
> Regards,
> Gilles
>
> >> [...]
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> For additional commands, e-mail: dev-h...@commons.apache.org
>
>

-- 
Avijit Basak

Re: [MATH][GENETICS] Review of PR #197

Reply via email to