Re: [Math] Review of "genetic algorithm" module

Gilles Sadowski Sat, 16 Jul 2022 05:36:54 -0700

Hello.

Le dim. 3 juil. 2022 à 08:39, Avijit Basak <avijit.ba...@gmail.com> a écrit :
>
> Hi All
>
>             Please find my responses. Sorry for the delay.
>
> > Comments related to new model:
> >>
> >> 1) Hierarchy of GeneticOperator: In the proposed model the genetic
> >> operators are designed hierarchically implementing a common interface and
> >> operators are accepted as a list.
> >> This enables interchangeability of operators. Library users would be able
> >> to use crossover and mutation operators interchangeably.
> >> However, in genetic algorithms or other population based search
> algorithms
> >> the operators are used for broadly two purposes, exploration and
> >> exploitation.
> >> In GA crossover is used for exploitation and mutation is used for
> >> exploration. Keeping them in a common hierarchy and allowing
> >> interchangeable operation violates that purpose.
> >
> >I'm not sure that the semantics of "exploitation" and "exploration"
> >should drive the API design.
> >Saying it differently: We don't need to know how various operators
> >will be used in order to implement them; hence IMO, there is no
> >need to discriminate at the API level.
>
> -- Even if we want to use a common abstraction, algorithm implementation
> needs to use them in different ways.
> The new proposed implementation does not consider different operators
> separately. I have mentioned this in the summary section at the end.


Please provide a concrete example of what the proposed implementation
is unable to handle in principle.  In practice, this implementation indeed
provides a "GeneticOperator" abstraction; but I do not understand how it
limits a user's possibility to select any concrete class that provides any
form of either "crossover" or "mutation".

> >>
> >> 2) Chromosome Fitness: In the new design the chromosome fitness is
> >> maintained as a hashmap where the key is the chromosome itself.
> >> Are we going to retain the fitness value in chromosome too otherwise
> >> implementation of Comparable won't be possible?
> >
> >Sorry, I don't follow.
> >
> >> Assuming the chromosome representation would be used to calculate
> hashcode,
> >> it would be a very time consuming process depending on the length of
> >> chromosome.
> >
> >Is this assumption correct?
> >For what purpose do we need to compute a custom hash code?
>
> -- A standard practice is to provide a custom implementation of hashCode
> and equals method whenever an object is used as key to any hash based data
> structure like HashMap we are using here.
> If you think it is good to rely on the JVM specific implementation, I have
> no concerns.

You are right that this should be examined carefully, to avoid unintended
behaviour.
A good start would perhaps be to check that "Population" behaves
correctly, with a minimal API.

> >
> >> E.g. in binary chromosomes we have provision to allow chromosome length
> >> upto (2^31 - 1).
> >
> >That's a lot. ;-)
> >Do you have use cases where such long representations can be handled?
>
> -- One such use case could be tuning of neural networks. Once we introduce
> genetic programming there would be a lot more use cases.

A discussion in itself, primarily focused on ensuring that the implementation
does not put undue limitations on genotype representation IIUC.
Is it the case with my proposal?

> >
> >> Along with that the chromosome implements Comparable
> >> interface.
> >> Equality by Comparable interface would be decided by fitness
> >
> >Sure.
> >
> >> however
> >> equality by equals() method would depend on representation.
> >
> >Do we need a custom "equals"?
>
> -- Mentioned above.

Yes, but without indicating whether there is a problem in my proposal.
I don't say that there isn't, but this should be confirmed with e.g. unit
tests (or "integration" tests?).

We don't want to support any unnecessary API or behaviours.
In this particular case, do we need to keep track of whether 2 genotypes
share the exact same representation?

> >
> >> As chromosomes with different representations may also have the same
> >> fitness, this would produce a conflict.
> >
> >Please provide an example.
>
> -- Think of a problem where we need to find out the minimum value of a
> function using GA where the function is a square of sinusoid like mentioned
> below.
> y = Math.pow(sin(x), 2)
>          where -180 deg <= x <= 180 deg
> We may have the same value of y for different values of x due to the
> replicative nature of the function.
> Similar situations can arrive in real life optimization problems too.
> Even the MathFunctionOptimization example we have used belongs to this
> category of problem.

I fail to see the "conflict" being referred to above (I forgot about the context
of that remark).

> >>
> >> 3) The model does not consider anything related to adaptive approaches.
> >> Would it be a separate factory?
> >
> >What I've proposed is an alternative "skeleton" for the API.  Of course,
> >more classes will provide specific functionality.
> >An adaptive operator "is-a" specialized genetic operator (perhaps the
> >notion of "Adaptive" will be defined through an interface?).
>
> -- In "commons-math4-ga2" module separate ApplicationRate has been
> introduced to handle the adaptive nature of the algorithm.
> But this model always uses sorting as the default option which is
> unnecessary for simple genetic algorithms. Isn't it a performance
> bottleneck?

It could be, and could be checked with benchmarks.
My current impression, based on the one example that uses the new
implementation, is that a much more important time consumer is the
logger!
Moreover, isn't it the case that sorting of the population is almost always
needed, so that better fitted chromosomes can be selected with higher
probability?

> >
> >>
> >> 4) Comparison of chromosomes: In the current model two chromosomes are
> >> compared after decoding the genotype to phenotype using the internal
> >> decoder.
> >> How can we address this in the new model?
> >
> >As mentioned above: Do we really need to compare representations?
>
> -- This is a minor scenario and can be ignored as of now.

Well, as I suggested from the outset, we should not implement anything
not strictly required for a minimally useful GA functionality.
However, all "scenarios" are worth mentioning (e.g. in a JIRA report); they
help picturing a design that is as simple as possible without blocking future
functionality.  [We also explicitly discussed early on that we did not intend
to create a framework for research *about* GA.]

> [...]

> >> >> >(5)
> >> >> >Why does a "Chromosome" need an "identifier"?
> >> >> >Method "getId()" is only used in "PopulationStatisticalSummaryImpl"
> >> >> >that is an internal class, where it seems that the chromosome itself
> >> >> >(rather than its "id") could serve as the map's key.
> >> >> -- How can we generate a map's key from the chromosome itself? Please
> >> >> suggest.
> >> >
> >> >A class to be used as a key only needs to implement "equals" and
> >> >"hashCode".
> >> -- The current chromosome class implements Comparable interface which
> uses
> >> chromosome fitness for comparison. Use of both Comparable and equals()
> >> might introduce inconsistencies.
> >
> >An example?
>
> -- Inconsistency can appear in case we provide a custom implementation of
> equals and hashcode following the representation of chromosome or use the
> default implementation.
> Since Comparable uses the fitness value to compare and as described above
> two chromosomes with separate representations can have the same fitness
> value this might result in inconsistency.
> But in the new module chromosome does not implement Comparable, so there is
> no possibility of the same.

IIUC, this is another example of why we should avoid any functionality
(such as "Comparable" in this case) not strictly necessary within the
intended scope.

> [...]
> >>
> >> We still need to agree on my proposal...
> >> Would you try to follow up on that basis, i.e. add a minimal set of GA
> >> operators, in order to show that some of the "examples" can be run?
> >
> >I've added the "MathFunctionOptimizer2" in the "Standalone" example
> >application.  Please check that it produces the expected output.
> >[I had to slightly change the function being optimized, and your decoding
> >function. Why are you assuming that "Coordinates" cannot be negative?]
>
> -- The fitness function uses the square of each dimensional value of the
> coordinate. So negative signs won't impact the fitness value.

No, but the example function could be just slightly more general (i.e. allow
for an optimum with a negative coordinate) and in that case, the encoding
prevents finding it...

>>>> [...]

> >
> >Hoping we can make significant progress so as to include the new
> >GA functionality in the upcoming release of "Commons Math"...
>
>
> I would like to summarize a few points which I think could be a problem in
> the new model.
>
> 1) Separation of crossover and mutation operator at the API level and the
> corresponding usage of the GeneticOperator in the algorithm implementation:
> -- As mentioned earlier we would need a separation of GeneticOperators.
> The current implementation of the algorithm does not distinguish the
> crossover and mutation operator and treats them uniformly.

There could be public classes that embody the intrinsic difference between
a crossover and a mutation; the current implementation shows they are
not necessary.
What would they bring from a developer's POV?
>From a user's POV, is there any loss in not having those public classes?

> This violates the basic concept of the algorithm itself and needs to be
> fixed.

The "Operators" (in package "gene/binary") utility class would define
factory methods for all the available operators (whose name univocally
relates to the kind of operation that it performs).

>
> 2) Use of adaptive probability:
> -- Currently the Mutation class keeps a constant probability value as well
> as a separate ApplicationRate was passed for each genetic operator.
> Since we are going to use the operator probability in an adaptive way we
> would need to pass the generated probability as an argument to the apply
> method of GeneticOperator.

Maybe I missed all the implications of the "adaptive" terminology; I've
implemented what I grokked from the "function" example.
You'd need to explicitly (re)state
 * how the "rate" is computed,
 * how it is used,
 * where the design fails to accommodate the functionality.

>From your remark above, I seem to understand that the *same* adaptive
rate value (for a given generation) would determine two things:
 (1) the probability of whether the operator will be applied
 (2) how some specific operator will behave *internally*
I intentionally did not implement (2) since that would seem to imply that the
operator could "internally" decide to *not* do what it is meant to do (i.e. a
mutate any gene with some fixed probability).  In particular, I understood
"adaptivity" as changing the probability of how often the operator is applied,
not changing the effect of the operator.
Since the user can add as many operator instances as he wishes

>
> 3) Separation of logic for adaptive and Simple GA:
> -- We might want to separate two different strategies of GA i.e. simple GA
> and adaptive GA.
> As for simple GA with constant probability we can avoid computation like
> rank calculation.

Isn't rank also (and primarily) used to select (elitistic) parents?
[Performance of the current code can indeed be slightly improved for the
case where "elitism" is strictly zero.]

Otherwise, ranking is performed only when the "Population" contents is
provided to the user (using the "GenerationCallback"), in which case it is
probably expected that the list of chromosomes will need to be sorted
sooner or later (e.g. to display it in the most obviously sensible way).

For example, ranking is *not* performed for the "Tournament" selection.

>
> 4) OffspringGenerator following Strategy pattern:
> -- Can we have an implementation of OffspringGenerator following strategy
> pattern. We should have default implementation following your first
> proposal.
> But there can be an optional parameter in the API method, where library
> users would be able to provide their custom implementation.

Where do you see a reference to "OffspringGenerator"?

>
> If you need a separate mail chain for each of the review comments mentioned
> above kindly let me know.

Yes, better discuss each topic in its own thread.
Also, as noted above, some can be moved to JIRA.

Thanks,
Gilles

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [Math] Review of "genetic algorithm" module

Reply via email to