[EMAIL PROTECTED] wrote:
>
> Mark Stier wrote:
> >
> > -----BEGIN PGP SIGNED MESSAGE-----
> >
> > Hello,
> >
> > have you ever thought of setting up a distributed Genetic
> > Programming network which lets you optimize or even find algorithms
> > suitable for music/speech compression 'automatically'?
> >
> > Maybe one could find algorithms that are even better than Frainhofer's.
> >
> > Regards,
> >
> > Mark
> ???????????????????
> please elaborate
OK , I will try to explain it , but I suggest searching on the web for
more info,
looking for keywords : genetic programming, evolutionary
algorithms,machine learning,
evolutionary programming, genetic algorithms, evolutionary strategies,
evolutionary
computing.
Genetic programming is a part of evolutionary computing ( or
evolutionary programming,
different people use different name for the whole field ).
Evolutionary programs try to find a solutions by imitating natural
evolution.
Basicly the algorithm is this :
1. generate the initial population and mark is as the current population
2. evaluate each object in the current population
3. based on the evaluation, generate the next generation of objects
and mark them as the current population ( the old population is
discarded )
4. repeat steps 2 and 3 until satisfied with the results
5. now the current population should be filled with more or less good
solutions
to the problem. Evaluate them and pick the one with the highest
result; this
is the result of the whole computation (*)
* - usully the best object of each generation is remembered separately
and then
they are considered in the final choice ( step 5 ) , because
sometimes they
can be better than the best object in the last generation
The population is a set of objects.
Objects are possible solutions ( for example when the problem you are
solving is
"find a solution to a set of 2 equations with 2 unknowns" then the
objects are
pairs of real numbers )
*** about step 1
to make the first generation you usualy generate them at random , for
our case :
for i=1 to N do
object[i].x = rand();
object[i].y = rand();
end;
unless you have some idea how the solutions should look. Then you can
insert sone object
you think are close to the solution, but is not really important,
because the algorithm
quickly discards bad "solutions" and leaves ( and generates ! ) good
solutions.
I used N as the number of objects, this is also called the population
size.
It should be big, the bigger the better ( and the program slower :-( )
For our example a size of several thousend objects is good.
*** about step 2
Evaluation : generate some "goodness" value for each object.
In our case we can transform the equations to a two parameter
function, which
gives 0 for the correct solution , with other words we convert our
problem to
"find x and y , for which f(x,y)=0". Then for each candidate x and y
compute
the goodness as 1/(abs(f(z,y))+1). This will give the value 1.0 for a
correct
solution and less than 1.0 for others.
*** about step 3
producing the next generation population goes like this:
(the population size usually stays constant)
for i=1 to N step 2 do ( in C : for(i=1;i<=N;i+=2) )
parent1 := select_parent_from(current_population);
parent1 := select_parent_from(current_population);
(child1,child2) := crossover(parent1,parent2);
mutate(child1);
mutate(child2);
insert_into_next_population(child1);
insert_into_next_population(child2);
end;
current_population := new_population;
Used functions :
select_parent_from() :
randomly select an object from the population. The chance of being
selected if proportional to the objects goodness value, the higher
it is , more probable for being selected.
crossover(a,b) :
create two new object by combining the properties of the parents.
this and the evaluation functions are the hardest to get right for
any new project. First you must choose a representation of the
object which makes the crossover operation easy. A common solution
is to represent the object properties by a string of so called
chromosomes , the entire string is called the gene ( sound familiar
? )
Then the crossover is easy , just choose a random spot in the
string
and break up the string at that position ( for both parents ) ,
then
exchange the rear parts of both parents and there you have the two
childs, example :
gene of parent 1 : AAAAAAAAAAAAAA ( each A is different chromosome
)
gene of parent 2 : BBBBBBBBBBBBBB
^-- crossover point
gene of child 1 : AAAAAAAAABBBBB
gene of child 2 : BBBBBBBBBAAAAA
In our example of finding a solution for the equation , the gene
can be the binary
representation of both x and y tacked together, in C notation:
/* ---- start ---*/
union {
float values[2];
char gene[2*sizeof(float)];
} converter;
converter.values[0] = object[u].x;
converter.values[1] = object[u].y;
char the_gene[2*sizeof(float)] ;
strncpy( the_gene ,converter.gene , 2*sizeof(float) );
/* a brand new gene is stored in the array the_gene !! */
/*---end---*/
then you treat each bit as a chromosome.
mutate(gene) : tryes to mutate the genes , pseudocode follows :
for each chromosome in gene do
if rand() <= MUTATION_PROBABILITY then
mutate_chromosome
MUTATION_PROBABILITY is set to some low value, so that mutation
occurs rarely.
mutate_chromosome() then changes somehow the chromosome. If each
chromosome
is a single bit, as in our example , then this is a simple negation.
After a few cycles ( generations ) of the program , the current
populations
should be filles with good aproximations of the correct ( optimal )
solutions (*).
20 generations are more than enough for our equation example.
Different versions of algorithms
--------------------------------
Evolutionary algorithms can be classified by various criteria :
- population size ( 1 or more )
- do they use mutations
- do they use crossover
- representation of objects ( genes )
- etc ...
The example with the equation is an example of using a genetic
algorithm.
Very interesting is genetic programming, where the objects ( genes ) are
not
simple data structures , but algorithms themselves !
They are usualy represented in some simplified programming language.
A popular choice is LISP. Both the evolutionary engine and the genes
are written in LISP ( the genes usually in some subset ). LISP is great
because it can hadle LISP code as data and is also able to execute it
( which is needed for evaluating the genes ). The LISP program also
has a natural tree representaion, which makes the crossover operation
very easy ( just replace a subtree , with another subtree ).
* - An important thing about these algorithms is that they DON'T give
the correct ( the optimal ) solutions for a problem, but only
a good aproximation. They are also not efficient for problems
which have good classical solutions. But they are great for NP
problems, like the Travelling Salesman Problem (TSP).
As we know , the TSP problem is finding a route between N points,
that goes thru every point and return to the starting point.
Some points have no connections between them, for all other there
is a known distance between them. The goal is to find the shortest
path which satisfies the criteria of visiting every point.
This problem is very heavy ( O(N!) - I think ), but an evolutionary
algorithm can give very quickly a solution path, which is within
1% of them length of the optimal solution. Like it can do this on
5 seconds , while searching the solutions with "normal" strategy
could take the whole day ( or year if you add a few points ).
This idea would be hard to use to find a good audio compression
algorithm,
because of the problem of evaluating, where you either need a good model
of human hearing ( doesn't exist or is not avaliable ) or real human
listeners ( impracticable ). But you could try to find a lossless
compression !
Or a lossy compression with minimum loss :-)
That is allow on average 1 or 2 LSB of error , maybe it gives somthing
useful ,
who knows.
Thank you for your attention,
David Balazic
P.S.: A had a class of "Evolutionary computing" two years ago. I might
have
forgotten some things or confused some terms. Also note that different
authors
use different names for the same or similar things. I think that the
evaluation
functions is usually called "the fitness function" ( remember : survival
of
the fittest ) and it result "fitness value". Maybe what I call "gene" in
this
mail is really "chromosome" and my "chromosome" is ... well , something
else
--
MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )