[agi] Reinforcement learning

Ben Goertzel Wed, 12 Feb 2003 18:10:07 -0800

Hi all ,

As a digression from the recent threads on the Friendliness or otherwise of certain uncomputable, unimplementable AI systems, I thought I'd post something on some fascinating practical AI algorithms.... These are narrow-AI at present, but, they definitely have some AGI relevance.

Moshe Looks recently pointed out some very exciting work to me, by a guy named Eric Baum.

[I'll give links to his papers later in this long e-mail.]

Put simply, this guy seems to have actually made reinforcement learning work.

The state of the art in reinforcement learning is reasonably well described in Sutton and Barto's book "Reinforcement Learning."

http://www.amazon.com/exec/obidos/tg/detail/-/0262193981/qid=1045100393/sr=8-1/ref=sr_8_1/102-4395230-3952902?v=glance&s=books&n=507846

As a read through this book reveals, the state of the art is pretty damn lame. The most cutting-edge methods described there still don't really work very well. It's clear that, if brains are doing reinforcement learning, they must be doing something quite different from these algorithms.

Eric Baum's stuff is further away from brain science and neural net models, but has the advantage of actually giving high-quality learning performance on some fairly difficult problems.

His system is a variation on John Holland's classifier systems [see e.g. Holland's book "Adaptation in Natural and Artificial Systems"

http://www.amazon.com/exec/obidos/tg/detail/-/0262581116/qid=1045100564/sr=1-2/ref=sr_1_2/102-4395230-3952902?v=glance&s=books

However, he replaces Holland's assignment-of-credit methodology (the bucket brigade, which is similar to Q-learning as described in Sutton and Barto) with a more sophisticated method based on economics and auctions. He also replaces Holland's GA-style crossover with a simpler mutation operator, but this is a much smaller change. It's the introduction of auction-based assignment of credit that is the big deal.

This is relevant to Novamente, because the Novamente design involves a combination of three methods for schema/predicate learning:

* evolution (eg. GA's, BOA)

* higher-order inference (using PTL probabilistic reasoning)

* reinforcement learning

Each of these has its own domains of strength. HOI is useful when there is a lot of prior knowledge regarding similar schema or predicates. Evolution is useful for particular sorts of problems, such as recognizing complex patterns in large bodies of information. Reinforcement learning is particularly useful for learning procedures to do with real-world interaction.

The current Novamente book draft proposes using a variant of the bucket brigade for reinforcement learning, and overcoming the limitations of this method by applying PTL logic to help with assignment of credit. This is a viable idea, I believe, but resource-intensive. (It has not been implemented yet.) I am going to think about how to integrate Baum's economics-based assignment of credit into Novamente's schema/predicate module for reinforcement learning. It should be possible to use PTL to improve Baum's scheme when resources are available for this (just as PTL can likely improve the bucket brigade).

Baum's papers on this stuff are on his website,

http://www.neci.nec.com/homepages/eric/eric.html

His most recent, strongest reinforcement learning system (Hayek4) is described in

"An Evolutionary Post Production System"

E. B. Baum and Igor Durdanovic

The previous version (Hayek3) is described in

"Evolution of Cooperative Problem-Solving in an Artificial Economy"

E. B. Baum and Igor Durdanovic

The following long paper is older, and describes obsolete versions of Hayek, but gives more information on Baum's underlying philosophy, and on the theoretical concepts underlying the system.

"Manifesto for an Evolutionary Economics of Intelligence"

E.B. Baum

His Hayek3 system uses GP-like function trees as a knowledge representation (instead of the simple bit string like rules of Holland's classifier systems). His Hayek4 uses Post production systems, which are better for symbolic patterns than GP-like function trees, though probably worse for quantitative patterns. Naturally, I'm curious to see how the Hayek algorithm would work on combinator trees (Novamente's representation for higher-order schemata/predicates, not yet fully implemented, but existent in prototype form thanks to Luke Kaiser). In principle the combinator representation should combine the merits of GP-like trees and Post production systems.

Don't get me wrong -- this stuff is exciting, but it's not a cure-all for artificial general intelligence! I believe that pure reinforcement learning, no matter how fancy, has fairly severe limitations. Baum seems to argue otherwise in his "Manifesto" paper, but then in the "Evolution of Cooperative Problem-Solving" paper he mentions the need for some kind of reasoning and more explicit knowledge representation to handle transferral of reinforcement-learned knowledge from one problem to another.

But it's really good to see that someone has finally gotten reinforcement learning to work, if not perfectly, at least decently. And, from a Novamente perspective, I'm very happy at the possibility of having something better to plug in for the "reinforcement learning of schema/predicates" module (though there are plenty of details to be thought about in that regard).

-- Ben

[agi] Reinforcement learning

Reply via email to