Russel,

The idea of a context in which we write functions and get general
memoization, maximization tools, and approximation has an appeal to me.

My recent thinking in this area is as follows:

It has been argued (for unrelated reasons) that we can distinguish two
basic ways of running a system. The first is to make a direct simulation,
like a direct implementation of Conway's game of life. The second is to
abbreviate the execution as much as possible-- as is done in the hashlife
algorithm. By memoizing as many steps as possible, we get an accelerating
speedup as we run the system.

This is also one difference between the basic Rete algorithm and SOAR
(although there are also other differences). However, SOAR memoizes in a
"lifted" way; when it keeps a result, it attempts to make it into as
general result as possible, so that the same result can be applied to more
situations.

Why don't existing functional languages like Haskell memoize automatically?
Because of the memory costs, but also because argument-hash memoization
isn't always the best implementation; it can depend on the computation. For
(a possibly extreme) example, you don't automatically get hashlife by
memoizing the function calls performed while computing the game of life.
(However, lifted memoization helps work towards this.)

Lifted memoization ("rule chunking") is a form of logical deduction which
occurs on-demand: when there is some area of execution which seems worth
optimizing by some criteria, a compound rule gets formed by rule chaining,
to capture the result of the recent computation. [Rules in rule systems are
an imperfect analogy for functions in functional systems, of course! I'll
deal with rule systems and perhaps you can comment on how well it carries
over...] As such, we could imagine applying more general logical deduction;
attempting to find closed-form expressions for functions which were defined
recursively would be an extreme example.

As usual, we can make a "neat" vs "scruffy" distinction. Lifted memoization
is "scruffy" control for deduction, as it relies on the statistics of
execution to reveal themselves (predicting something will be useful again
if it was useful once) rather than trying to determine ahead of time what
transformations on the program will make it more efficient. (As such, it is
similar to optimizations based on JIT compilation vs optimizations made
before code runs.) In general, determining what code transformations will
be useful before the code runs is rather hard. However, I would expect that
deductive code transformation could provide some gains not easily gotten
through lifted memoization. (So the two seem complimentary.)

Neat vs scruffy also comes up in control of the search: we could use A*
heuristic search to determine which rules to fire first, using "relaxed"
problems (ignoring some rule conditions) as estimates of the "distance"
between the solution and what is currently known. On the other hand, we
could try to do it the "scruffy" way by applying some kind of reinforcement
learning to rule selection.

I suspect that semi-logical evaluations of run time (IE, heuristics which
provide definite bounds) would be useful for other parts of the system as
well. For example, chunking occasionally produces new rules which are
actually more expensive rather than time-saving. It would be useful to
estimate the cost of a rule before introducing it to the system.

Things get even more complicated if we accept answers at every stage in the
system as being approximate. In this case, "lifted memoization" can be
relaxed to "educated guessing": we allow the system to generalize
inductively rather than logically. We need to have some way to keep
guessing from spiraling out of control, though. For example, it would be
good to have the guarantee that the system will always fall back on the
true function rather than approximations if there is enough time. (Perhaps
any computationally expensive function becomes an anytime algorithm, with
an incrementally learned approximation that's returned before entering into
the true computation.)

In any case, most of this is wild speculation based on a distinct lack of
experience implementing such systems...

Best,

Abram

On Mon, Oct 8, 2012 at 10:10 AM, Russell Wallace
<[email protected]>wrote:

> From one perspective, the objective program is an approximation to the
> argmax function, that is, it takes as input an evaluation function and
> tries to output the data structure (which in some cases will itself be
> a program) with the highest evaluation. (From this perspective AGI is
> hard because argmax is NP-hard, which is why we can only approximate
> it.) Such a program would not by itself have human level intelligence,
> e.g. it would not pass the Turing test, but it could be useful in a
> wide range of technical domains: input a propositional formula, get
> (in some cases) a set of variable assignments that satisfies the
> formula; input a program and a set of security criteria, get (in some
> cases) a list of security flaws; input a set of labelled photographs,
> get a program that can classify photographs (hopefully including ones
> that weren't in the training set).
>
> This is a useful way to look at it because argmax applies at many
> levels of decomposition. For example, the search for security flaws in
> existing software will involve reasoning about the code, which will
> require proving theorems, itself a problem of search i.e. argmax. So
> when the system has a theorem to prove, it recursively calls another
> copy of the AGI program (at least conceptually; the obvious
> implementation of this would have overhead, much of which could
> hopefully be removed in an optimized implementation); within that
> proof search, the same process of recursion could be used for
> intermediate lemmas (cf. the DPLL algorithm for propositional
> satisfiability). In the opposite direction, when the system needs to
> improve its performance at a task such as theorem proving, the search
> for better heuristics for this purpose is itself an argmax problem to
> which the AGI can be applied, as is the search for better
> meta-heuristics for that search.
>
> Obviously we are going to want to apply some kind of memoization such
> that when the same problem is encountered repeatedly, the answer need
> not be recomputed every time. Should this be a hardwired framework
> feature or a consequence of the learning algorithm? The answer to this
> is complicated by the question of what constitutes the same problem.
> Certainly the system needs to see through differences in variable
> names. It might also need to see through other non-differences like
> A+B vs B+A and A+0 vs A.
>
> The body of the program should be written in a purely functional
> language. One reason is that this encourages the search procedure to
> develop reusable building blocks. Another reason, more specific to the
> proposed architecture, relates to the recursive search structure. At
> each argmax node, as we calculate better approximations - find better
> answers - these can be propagated upwards without having to rerun the
> entire global computation from scratch.
>
> That leads on to the next question: how do we allocate CPU time? The
> recursive search structure offers a starting point: each argmax node
> can be a process, continually searching for better answers to send to
> its callers. Of course there will be too many of these to make each
> one a process at the operating system level let alone give each its
> own CPU core, so now we need to decide how to allocate a small number
> of CPU cores to many candidate processes. In the worst case this could
> be regarded as a problem requiring general computation in its own
> right, but our list of open problems is tediously long as it stands;
> is there anything more definite we can say about this one?
>
> Perhaps there is. There are general arguments that the expected
> utility of producing some result should be (something isomorphic to) a
> real number. Further, we can argue that if node A calls B as a
> subgoal, the utility of B can't exceed that of A. (If a node has
> multiple callers, its utility is bounded by the sum of caller
> utilities.)
>
> By itself the above reasoning doesn't buy very much; we could still
> have a scenario where A with a utility of 1 calls B and C giving them
> both a utility of 1 (maybe both are essential) and so on all the way
> down. But once we start thinking about costs we can argue that if the
> amount of resources it's worth spending on A is 1, that must be the
> amount it's worth spending on A and all its subgoals; so we might end
> up with A reserving 0.2 units of resources for its own internal
> computation and giving 0.4 units each to B and C; then we can spend
> CPU time on each node in proportion to its resource allocation. Now we
> have something like a usable specification. (cf. e.g. Holland's bucket
> brigade algorithm.)
>
> Memory is trickier than CPU time; more on this later.
>
> Comments so far?
>
>
> -------------------------------------------
> AGI
> Archives: https://www.listbox.com/member/archive/303/=now
> RSS Feed: https://www.listbox.com/member/archive/rss/303/7190161-766c6f07
> Modify Your Subscription:
> https://www.listbox.com/member/?&;
> Powered by Listbox: http://www.listbox.com
>



-- 
Abram Demski
http://lo-tho.blogspot.com/



-------------------------------------------
AGI
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/21088071-c97d2393
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=21088071&id_secret=21088071-2484a968
Powered by Listbox: http://www.listbox.com

Reply via email to