Re: [Math] Review of "genetic algorithm" module

Avijit Basak Fri, 18 Feb 2022 02:32:31 -0800

Hi All

        Please see my comments below.


>>         Please find my comments.
>>
>> >> (2)
>> >> >The "GeneticException" class seems to mostly deal with "illegal"
>> >> >arguments; hence it should be a subclass of the JDK's standard
>> >> >"IllegalArgumentException" (and be renamed accordingly).
>[...]
>> >
>> -- I shall make the changes. What could be the name for the Exception.
>> GeneticIllegalArgumentException would be fine?
>
>I guess so.
>
>The fool-proof, preferred approach, is to have package-private
>exception classes.
>
>Another suggestion would be to create a container class (within
>a package duly marked as "internal, not part of the public API"):
>---CUT---
>public GeneticException {
>
>    public static class IllegalArgument extends IllegalArgumentException {
>        // ...
>    }
>
>    // etc.
>}
>---CUT---
>
>The important thing is that the "@throws" Javadoc tags only
>document the base classes (i.e. the JDK's standard exception).
>

--I have modified the exception class name to
GeneticIllegalArgumentException.


>> >>
>> >> >[Exception messages need review for spelling and formatting.]
>> >> -- It will be really helpful if you can point out some specific
examples.
>> >
>> >We can fix this when the PR has reached some stability.
>> >
>> >>
>> >> (3)
>> >> >IMO Javadoc should avoid redundant phrases like "This class" as
>> >> >the first words of a class description.
>> >> --Refractored the javadoc comments. Please review and mention if you
need
>> >> any further changes.
>> >
>> >I've not looked yet, but thanks for taking it into account.
>> >Similarly to the previous point, these clean-ups can happen later.
>> >
>> >> >A similar remark holds for fields in "GeneticException" class: Since
>> >> >the name of the field is self-documenting, duplication in the Javadoc
>> >> >is visual noise ("Message template" is concise and clear enough).
>> >> --Removal of the javadoc comments produces a checkstyle error.
>> >
>> >I did not suggest to remove any Javadoc, only to rephrase it as:
>> >---CUT---
>> >/** "Message template". */
>> >---CUT---
>> >
>> -- Just for confirmation. Do you need the comment to be changed from
>> ---CUT---
>> /** Error message for "out of range" condition. */
>> ---CUT---
>> *to*
>> ---CUT---
>> /** "out of range" */
>> ---CUT---
>
>No, sorry, I just meant that e.g.
>---CUT---
>/** Message template. */
>private static final String OUT_OF_RANGE = "Out of range: min={0} max={1}";
>---CUT---
>is sufficient. Since the field is private, the comment will mostly ever
>be read while looking at the source code.

--I think we have a misunderstanding here. The field is public, not
private. Will a hard coded "Message Template" comment be sufficient?


>>       JDK(*ProviderBuilder.RandomSourceInternal.JDK*),
>> ...
>> }
>> ---CUT---
>> I am not sure how to achieve this for our algorithm classes.
>
>I'm a bit lost here; I don't get the relationship of this with my
>remark above (constant vs adaptive)...  Maybe you could file
>JIRA report to clarify and further discussion?

--Let me try to clarify my original understanding. I think the proposal is
to introduce an Enum and declare each type of algorithm classes as
instances of enum. So it might follow this template:
--CUT--
enum GeneticAlgorithms {
    SGA([args...]),//Simple genetic algorithm equivalent to
GeneticAlgorithm class
AGA([args...]) //Adaptive genetic algorithm.
}
--CUT--
Where [args...] are selection, crossover, mutation operators and
probalities.

As the AdaptiveGeneticAlgorithm is a generalization of GeneticAlgorithm it
would be possible to represent the later by former using constant crossover
and mutation rate generators.
But the problem lies in a different area.
In enumeration the required arguments ([args...]) need to be specified
during the declaration using constant values.
But for our implementation those arguments need to be provided by the
client programs. I don't know how to achieve that using enum.
Let me know if you have any different views regarding this.

>
>> >
>> >>
>> >> (7)
>> >> >The currently available GA implementations are sequential.
>> >> >IIUC, the "nextGeneration" methods should provide an option
>> >> >that processes the population using multiple threads.
>> >> --This needs to be done. However,  I would like to address this along
>> with
>> >> parallel GA i.e. convergence of multiple populations together.
>> >
>> >The two features (multi-thread vs multiple populations) should
>> >be implemented independently:  Users that only need the "basic"
>> >GA should also be able to take advantage of their machine's
>> >multiple CPUs.
>> >[This is related to the design issue which I mentioned previously.]
>> >
>> -- I am thinking to leverage user's multiple CPUs for doing
>> multi-population GA.
>
>OK (sort-of, since "the devil is in the details", and I'm not sure
>that we mean the same thing by "multi", see below).
>
>> It would a global approach where same thread pool
>> would be used for both purposes. Another class would be introduced for
>> executing parallel genetic algorithm which would accept multiple
instances
>> of AbstractGeneticAlgorithm class and converge them in parallel. Users
who
>> does not care for robustness would go for current implementations of the
>> algorithm with single population. For a better optimization quality users
>> would chose the new class.
>
>As hinted by my comment is the previous message, I've still to
>clarify my own expectations; but I vaguely sense some lost
>opportunity for simpler usage simpler and increased performance
>through the caller just needs to specify the number of "worker
>threads".
-- We should have both options. Users can execute the algorithm by
specifying only the number of worker threads with a single population as
well as optimize multiple populations in a parallel fashion.
For the multi-population option the common thread pool would be reused for
all populations.
>
>Do we at least agree that
>1. Adding/retrieving a "Chromosome" to/from a "Population"
>must be thread-safe (and is not trivial)
>2. Fitness computation is where most time is usually spent
>(so that multi-threading must be achieved at that granularity)
>?
--The way I am thinking of designing a task is that it should accept the
current population and return an instance of ChromosomePair.
The chromosomes within this pair would be added to the population by the
caller thread. Population won't be updated by multiple threads.
The code snippet below shows the body of the method which will be executed
inside the task.
--CUT--

//selection
ChromosomePair pair = getSelectionPolicy().select(current);

// crossover
if (randGen.nextDouble() < getCrossoverRate()) {
// apply crossover policy to create two offspring
pair = getCrossoverPolicy().crossover(pair.getFirst(), pair.getSecond());
}

// mutation
if (randGen.nextDouble() < getMutationRate()) {
// apply mutation policy to the chromosomes
pair = new ChromosomePair(
getMutationPolicy().mutate(pair.getFirst()),
getMutationPolicy().mutate(pair.getSecond()));
}

return pair;

--CUT--

>
>I'd surmise that "multiple instances of AbstractGeneticAlgorithm"
>is an application concern; unless I'm missing something, it's
>not what I've in mind when talking about multi-threading.
>Actually, I was wondering whether we could implement the
>analog of what is in the "commons-math-neuralnet" module,
>where
>* "Neuron" is the counterpart "Chromosome"
>* "Network" is the counterpart of "Population".
--"multiple instance of AbstractGeneticAlgorithm" is related to parallel GA
with multiple populations not multi-threading.
Users can also implement parallel GA in a synchronous manner although that
won't be a recommended way.
Multi-threading is only a way to improve performance using a user's multi
core CPU.
The threads in the thread pool would only be used to execute the task as
mentioned in the previous comment.

I think we have some misunderstanding over here. It is better to do an
implementation first and start the discussion.
It would be more productive.


Thanks & Regards
--Avijit Basak

On Thu, 17 Feb 2022 at 01:09, Gilles Sadowski <[email protected]> wrote:

> Hello.
>
> Le mer. 16 févr. 2022 à 17:31, Avijit Basak <[email protected]> a
> écrit :
> >
> > Hi All
> >
> >         Please find my comments.
> >
> > >> (2)
> > >> >The "GeneticException" class seems to mostly deal with "illegal"
> > >> >arguments; hence it should be a subclass of the JDK's standard
> > >> >"IllegalArgumentException" (and be renamed accordingly).
> > >> >If other condition types are needed, then another internal class
> > >> >should be defined with the corresponding standard semantics.
> > >> --IMHO if we think of a single exception class we should extend it
> only
> > >> from RuntimeException.
> >
> > >"single exception class" is not a requirement (it cannot be since
> > >we agreed some time ago that it was better to align with the JDK's
> > >delineation of error conditions (IAE, NPE, ILSE, AE, ...).
> >
> > >> If we think of multiple exception classes in one
> > >> module we may need to think of a base exception class. Other classes
> > would
> > >> extend the same.
> > >
> > >Please no.  We'd taken that approach in "Commons Math" (cf.
> > >base class now in module "commons-math-legacy-exception"),
> > >as I've mentioned already IIRC: It was a failed experiment IMO.
> > >[For more details, please refer to the archive of the "dev" ML.]
> > >
> > >> The approach mentioned above would mix up these two.
> > >> Please share your opinion regarding this.
> > >
> > >Eventually, all new components ([RNG], [Number], [Geometry], ...)
> > >adopted the simple approach of non-public API (ideally private
> > >or package-private) exception classes only for the developer's
> > >use (and the purpose of which is limited to avoiding duplication).
> > >
> > -- I shall make the changes. What could be the name for the Exception.
> > GeneticIllegalArgumentException would be fine?
>
> I guess so.
>
> The fool-proof, preferred approach, is to have package-private
> exception classes.
>
> Another suggestion would be to create a container class (within
> a package duly marked as "internal, not part of the public API"):
> ---CUT---
> public GeneticException {
>
>     public static class IllegalArgument extends IllegalArgumentException {
>         // ...
>     }
>
>     // etc.
> }
> ---CUT---
>
> The important thing is that the "@throws" Javadoc tags only
> document the base classes (i.e. the JDK's standard exception).
>
> > >>
> > >> >[Exception messages need review for spelling and formatting.]
> > >> -- It will be really helpful if you can point out some specific
> examples.
> > >
> > >We can fix this when the PR has reached some stability.
> > >
> > >>
> > >> (3)
> > >> >IMO Javadoc should avoid redundant phrases like "This class" as
> > >> >the first words of a class description.
> > >> --Refractored the javadoc comments. Please review and mention if you
> need
> > >> any further changes.
> > >
> > >I've not looked yet, but thanks for taking it into account.
> > >Similarly to the previous point, these clean-ups can happen later.
> > >
> > >> >A similar remark holds for fields in "GeneticException" class: Since
> > >> >the name of the field is self-documenting, duplication in the Javadoc
> > >> >is visual noise ("Message template" is concise and clear enough).
> > >> --Removal of the javadoc comments produces a checkstyle error.
> > >
> > >I did not suggest to remove any Javadoc, only to rephrase it as:
> > >---CUT---
> > >/** "Message template". */
> > >---CUT---
> > >
> > -- Just for confirmation. Do you need the comment to be changed from
> > ---CUT---
> > /** Error message for "out of range" condition. */
> > ---CUT---
> > *to*
> > ---CUT---
> > /** "out of range" */
> > ---CUT---
>
> No, sorry, I just meant that e.g.
> ---CUT---
> /** Message template. */
> private static final String OUT_OF_RANGE = "Out of range: min={0} max={1}";
> ---CUT---
> is sufficient.  Since the field is private, the comment will mostly ever
> be read while looking at the source code.
>
> > >> (6)
> > >> >Why support inheritance for "AbstractGeneticAlgorithm"?
> > >> >Why would users need their own subclass, rather than call those
> > >> >implemented within the library (currently, "GeneticAlgorithm" and
> > >> >"AdaptiveGeneticAlgorithm")?
> > >> >Couldn't we encapsulate the choice of algorithm in an "enum",
> > >> >similar to "RandomSource" in [RNG].
> > >> >Do I understand correctly that the (only?) difference between the
> > >> >two classes is the ability to adapt crossover and mutation rates?
> > >> -- The difference between GeneticAlgorithm and
> AdaptiveGeneticAlgorithm
> > is
> > >> the ability to adapt crossover and mutation probability. However,  as
> per
> > >> my understanding enum encapsulation is appropriate with the same set
> and
> > >> type of constructor arguments, where the arguments can be provided
> during
> > >> enum declaration. In our case the arguments would be provided by the
> > client
> > >> program and cannot be pre-initialized as part of an enum declaration.
> > >
> > >Why not?
> > >A constant rate seems like a (trivial) type of adaptive rate.
> >
> > -- Our algorithm classes accept various user provided arguments like
> > crossover, mutation and selection operators. Enum declaration requires
> all
> > of the arguments to be provided at the declaration time in the class
> itself
> > like
> > ---CUT---
> > public enum RandomSource {
> >       JDK(*ProviderBuilder.RandomSourceInternal.JDK*),
> > ...
> > }
> > ---CUT---
> > I am not sure how to achieve this for our algorithm classes.
>
> I'm a bit lost here; I don't get the relationship of this with my
> remark above (constant vs adaptive)...  Maybe you could file
> JIRA report to clarify and further discussion?
>
> > >
> > >>
> > >> (7)
> > >> >The currently available GA implementations are sequential.
> > >> >IIUC, the "nextGeneration" methods should provide an option
> > >> >that processes the population using multiple threads.
> > >> --This needs to be done. However,  I would like to address this along
> > with
> > >> parallel GA i.e. convergence of multiple populations together.
> > >
> > >The two features (multi-thread vs multiple populations) should
> > >be implemented independently:  Users that only need the "basic"
> > >GA should also be able to take advantage of their machine's
> > >multiple CPUs.
> > >[This is related to the design issue which I mentioned previously.]
> > >
> > -- I am thinking to leverage user's multiple CPUs for doing
> > multi-population GA.
>
> OK (sort-of, since "the devil is in the details", and I'm not sure
> that we mean the same thing by "multi", see below).
>
> > It would a global approach where same thread pool
> > would be used for both purposes. Another class would be introduced for
> > executing parallel genetic algorithm which would accept multiple
> instances
> > of AbstractGeneticAlgorithm class and converge them in parallel. Users
> who
> > does not care for robustness would go for current implementations of the
> > algorithm with single population. For a better optimization quality users
> > would chose the new class.
>
> As hinted by my comment is the previous message, I've still to
> clarify my own expectations; but I vaguely sense some lost
> opportunity for simpler usage simpler and increased performance
> through the caller just needs to specify the number of "worker
> threads".
>
> Do we at least agree that
> 1. Adding/retrieving a "Chromosome" to/from a "Population"
> must be thread-safe (and is not trivial)
> 2. Fitness computation is where most time is usually spent
> (so that multi-threading must be achieved at that granularity)
> ?
>
> I'd surmise that "multiple instances of AbstractGeneticAlgorithm"
> is an application concern; unless I'm missing something, it's
> not what I've in mind when talking about multi-threading.
> Actually, I was wondering whether we could implement the
> analog of what is in the "commons-math-neuralnet" module,
> where
> * "Neuron" is the counterpart "Chromosome"
> * "Network" is the counterpart of "Population".
>
> Regards,
> Gilles
>
> >>> [...]
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Re: [Math] Review of "genetic algorithm" module

Reply via email to