Some people might find this mini essay interesting as it touches on
what I think of as the problem of general intelligence, if such a
thing can be well defined.
My interest in machine learning is concentrated in problems where the
problem is not easy to define, or may be easily misdefined.
A reinforcement scenario, from wikipedia is defined as
"Formally, the basic reinforcement learning model consists of:
1. a set of environment states S;
2. a set of actions A; and
3. a set of scalar "rewards" in the Reals.
"
An interesting scenario which falls outside of the normal formalisms
is a robot with a reinforcement learning system that is
constrained by what it can do by the energy supply. So any action that
causes some energy to be used may or may not reduce the chance of getting
reinforcement. Every computation in a non-reversible computer uses
some energy, so should be considered in the set of actions. Obviously
this set of actions is impossible to monitor fully and precisely,
because the act of monitoring causes more actions, which need to be
monitored ad infinitum. So some subset of the actions need to be
monitored to make sure the system is not wasting energy. That set may
be easily misdefined or may not be known a priori, so a way is needed
of being able to redefine the the set of actions to be monitored. This
gets interesting when the learning algorithms themselves are
considered as acting on the world in other means than purely as
hypothesis changers. This ultimately means that the learning
algorithm might need to be changed to minimise the negative affect of
its actions on the likelyhood of reward.
In this light I would like to propose another possible direction for
machine learning research. Rather than focusing on the capabilities of
learning algorithms, instead it concentrates on how we can make
systems that can acquire, through various means, the correct algorithm
for the situation they are facing. That is the system learns not so
much about the outside world but more about what it should be and what
it should be doing. This direction of investigation I
shall call learning architectures, by this I mean a computer
architecture whose behaviour is on the whole determined by the
programs within it, but has a distinct goal oriented nature. A 'PC
with a goal' would be a simple description.
A diagram showing the layout of the internals of a tentative learning
architecture design is below. The changeable internal modules are for
example only, and are the programs I describe below.
http://codesoup.sourceforge.net/new.png
Those of you familiar with reinforcement learning will have noticed
the reinforcement function, this acts purely as a critic. That is the
only part that has a similar activity to a learning algorithm,
everything else about how problems are solved is determined by the
programs. I shall attempt to explain how such a minimalistic learning
architecture might work, however there is still a fair amount of
theory to be worked out.
The critic provides set of rewards, and send positive and
negative signals back to the rest of the programs in the architecture.
Reinforcement is what is needed for a program to be able to overwrite
another program or prevent another program from overwriting it. It is
acts as a finite resource to the internal programs; it can be given to
other programs and is used up when overwriting another program.
This sets up the potential for a very loose evolutionary system. If
the programs compete for space then those that have the most positive
reinforcement survive. This allows meta-learning if some of the
programs are learning algorithms, or meta-meta-learning if they are
meta-learning algorithms.
As you can see from the links in the diagram only those programs that
directly send outputs to the world directly get reinforcement, they
then pass it back to other programs that have provided information to
them or have created them. It is up to them to pass the correct amount
of reinforcement back so that the programs that help them can survive
as well. An economy needs to be set up, with the critic being the
invisible hand that guides it.
The type of learning that this evolutionary competition creates will
likely be very slow. It is not meant for day to day learning, which
would be supplied by the internal programs much like normal learning
algorithms. So although the architecture I propose is similar in some
ways to Avida and other artificial life programs, I suggest they
should be seeded with an ecology of complex learning programs that
form an economy
instead of being started from scratch. These programs should also be
sufficient for rudimentary performance of whatever task is required of
it.
If you can show that the language the internal programs are written in
is Turing complete, then you are not sacrificing any potential power
in terms of writing your everyday learning algorithms. But neither do
you gain anything.
So if it not supposed to create a better learner in the everyday sense
what merits us to study this sort of learner, you may well be asking.
It is in theory no more adaptable, you could emulate any of the
changes in learning due to the evolutionary process with a traditional
meta-learning algorithm. However it does allow the memory to be reused
and the foresight of the designer over what type of problems the
system will face is not so crucial. So it can be seen to be more
mutable.
The ability to modify any part of the learning algorithms opens up the
possibility of social sharing of learned programs. For example, one
learning architecture agent could do the expensive meta-learning for a
problem and then share the good algorithm discovered with other
systems doing similar learning tasks. They could then use the
evolutionary mechanisms above to see whether the new externally given
program is actually better for their own distinct goal.
This ability to take in new programs, which it shares with normal PCs,
also opens it up to getting viruses. However the lack of a root user
means that privilege escalation is a lot harder. Instead of having to
get control of one program to have free reign in the system, a virus
would have to get control of a program and then make it perform
better than it had in the past in order to increase its
potential for altering the rest of the system. This assumes that the
critic is fixed in a ROM or something similar and is very resistant to
alteration.
I justify calling it another strand of machine learning, because it
needs a different methodology of research. Instead of showing how well
an algorithm solved a problem, you would have to show that in the
architecture; the behaviour that enabled the program to preferentially
survive would be getting the most reinforcement. This is non-trivial, because of
the potential for competitive programs that try to sabotage the smooth
running of the other programs in a variety of creative ways.
I leave you with a quote and a question.
"any algorithm performs only as well as the knowledge concerning the
cost function put into the cost algorithm". Wolpert & Macready 1995
What possible ways are there for putting knowledge into algorithms
that do not require us to program it directly?
Will Pearson
-------
To unsubscribe, change your address, or temporarily deactivate your subscription,
please go to http://v2.listbox.com/member/[EMAIL PROTECTED]