Re: [agi] Computing's coming Theory of Everything

Steve Richfield Tue, 22 Jul 2008 16:12:00 -0700

Abram,

All good points. Detailed comments follow. First I must take a LONG drag,
because I must now blow a lot of smoke...


On 7/22/08, Abram Demski <[EMAIL PROTECTED]> wrote:
>
> On Tue, Jul 22, 2008 at 4:29 PM, Steve Richfield
> <[EMAIL PROTECTED]> wrote:
> > Abram,
> >
> > On 7/22/08, Abram Demski <[EMAIL PROTECTED]> wrote:
> >>
> >> From the paper you posted, and from wikipedia articles, the current
> >> meaning of PCA is very different from your generalized version. I
> >> doubt the current algorithms would even metaphorically apply...
> >
> >
> > Just more input points that are time-displaced from the present points,
> or
> > alternatively in simple cases, compute with the derivative of the inputs
> > rather than with their static value.
>
> Such systems might produce some good results, but the formalism cannot
> represent complex relational ideas.


All you need is a model, any model, capable of representing reality and its
complex relationships. I would think that simple cause-and-effect might
suffice, where events cause other events, that in turn cause still other
events. With a neuron or PCA coordinate for each prospective event, I could
see things coming together. The uninhibited neurons (or PAC coordinates) in
the last layer would be the possible present courses of action. Stimulate
them all and the best will inhibit the rest, and the best course of action
will take place.

It is not even capable of
> representing context-free patterns (for example, pictures of
> fractals).


Can people do this?

Of course, I'm referring to PCA "as it is", not "as it
> could be".
>
> >>
> >> Also, what would "multiple layers" mean in the generalized version?
> >
> >
> > Performing the PC-like analysis on the principal components derived in a
> > preceding PC-like analysis.
>
> If this worked, it would be another way of trying to break up the task
> into subtasks. It might help, I admit. It has an intuitive feel; it
> fits the idea of there being levels of processing in the brain. But if
> it helps, why?


Maybe we are just large data reduction engines?

What clean subtask-division is it relying on?


As I have pointed out here many times before, we are MUCH shorter on
knowledge of reality than we are on CS technology. With this approach, we
might build AGIs without even knowing how they work.

The idea
> of iteratively compressing data by looking for the highest-information
> variable repeatedly makes sense to me, it is a clear subgoal. But what
> is the subgoal here?
>
> Hmm... the algorithm for a single level would need to "subtract" the
> information encoded in the new variable each time, so that the next
> iteration is working with only the still-unexplained properties of the
> data.


(Taking another puff) Unfortunately, PCA methods produce amplitude
information but not phase information. This is a little like indefinite
integration, where you know what is there, but not enough to recreate it.

Further, maximum information channels would seem to be naturally orthogonal,
so subtracting, even if it were possible, is probably unnecessary.

The variables then should be independent, right?


To the extent that they are not independent, they are not orthogonal, and
less information is produced.

Yet, if we take
> the multilevel approach, the 2nd level will be trying to take
> advantage of dependencies in those variables...


Probably not linear dependencies because these should have been wrung out in
the previous level. Hopefully, the next layer would look at time sequencing,
various combinations, etc.

Perhaps this will work due to inaccuracies in the algorithm, caused by
> approximate methods. The task of the higher levels, then, is to
> correct for the approximations.


This isn't my (blurred?) vision.

But if this is their usefulness, then
> it needs to be shown that they are capable of it. After all, they will
> be running the same sort of approximation. It is possible that they
> will therefore miss the same sorts of things. So, we need to be
> careful in defining multilevel systems.


Story: I once viewed being able to invert the Airy Disk transform (what
makes a blur from a point of light in a microscope or telescope) as an
EXTREMELY valuable thing to do to greatly increase their power, so I set
about finding a transform function. Then, I wrote a program to test it,
first making an Airy Disk blur and then transforming it back to the original
point. It sorta worked, but there was lots of computational noise in the
result, so I switched to double precision, whereupon it failed to work at
all. After LOTS more work, I finally figured out that the Airy Disk function
was a perfect spacial low pass filter, so that two points that were too
close to be resolved as separate points made EXACTLY the same perfectly
circular pattern as did a single point of the same total brightness. In
single precision, I was inverting the computational noise, and doing a
pretty good job of it. However, for about a month, I thought that I had
changed the world.

I also once had a proof for Fermat's Last Theorem that lasted about a week
rattling around the math department of a major university.

Hence, you are preaching to the choir regarding care in approach. I have
already run down my fair share of blind alleys.

Steve Richfield
==============

> >>> On Tue, Jul 22, 2008 at 2:58 PM, Steve Richfield
> >> <[EMAIL PROTECTED]> wrote:
> >> > Abram,
> >> >
> >> > On 7/22/08, Abram Demski <[EMAIL PROTECTED]> wrote:
> >> >>
> >> >> "Problem Statement: What are the optimal functions, derived from
> >> >> real-world observations of past events, the timings of their comings
> >> >> and goings, and perhaps their physical association, to extract each
> >> >> successive parameter containing the maximum amount of information (in
> >> >> a Shannon sense) usable in reconstructing the observed inputs."
> >> >>
> >> >> I see it now! It is typically very useful to decompose a problem into
> >> >> sub-problems that can be solved either independently or with simple
> >> >> well-defined interaction. What you are proposing is such a
> >> >> decomposition, for the very general problem of compression. "Find an
> >> >> encoding scheme for the data in dataset X that minimizes the number
> of
> >> >> bits we need" can be split into subproblems of the form "find a
> >> >> meaning for the next N bits of an encoding that maximizes the
> >> >> information they carry". The general problem can be solved by
> applying
> >> >> a solution to the simpler problem until the data is completely
> >> >> compressed.
> >> >
> >> >
> >> > Yes, we do appear to be on the same page here. The challenge is that
> >> > there
> >> > seems to be a prevailing opinion that these don't :stack" into
> >> > multi-level
> >> > structures. The reason that this hasn't been tested seems obvious from
> >> > the
> >> > literature - computers are now just too damn slow, but people here
> seem
> >> > to
> >> > think that there is another more basic reason, like it doesn't work. I
> >> > don't
> >> > understand this argument either.
> >> >
> >> > Richard, perhaps you could explain?
> >> >>
> >> >> "However, it still fails to consider temporal clues, unless of course
> >> >> you just consider these to be another dimension."
> >> >>
> >> >> Why does this not count as a working solution?
> >> >
> >> >
> >> > It might be. Note that delays from axonal transit times could quite
> >> > easily
> >> > and effectively present inputs "flat" with time presented as just
> >> > another
> >> > dimension. Now, the challenge of testing a theory with an additional
> >> > dimension, that already clogs computers without the additional
> >> > dimension.
> >> > Ugh. Any thoughts?
> >> >
> >> > Perhaps I should write this up and send it to the various people
> working
> >> > in
> >> > this area. Perhaps people with the present test beds could find a way
> to
> >> > test this, and the retired math professor would have a better idea as
> to
> >> > exactly what needed to be optimized.
> >> >
> >> > Steve Richfield
> >> > =================
> >> >>
> >> >> On Tue, Jul 22, 2008 at 1:48 PM, Steve Richfield
> >> >> <[EMAIL PROTECTED]> wrote:
> >> >> > Ben,
> >> >> > On 7/22/08, Benjamin Johnston <[EMAIL PROTECTED]> wrote:
> >> >> >>>
> >> >> >>> You are confusing what PCA now is, and what it might become. I am
> >> >> >>> more
> >> >> >>> interested in the dream than in the present reality.
> >> >> >>
> >> >> >> That is like claiming that multiplication of two numbers is the
> >> >> >> answer
> >> >> >> to
> >> >> >> AGI, and then telling any critics that they're confusing what
> >> >> >> multiplication
> >> >> >> is now with what multiplication may become.
> >> >> >
> >> >> >
> >> >> > Restating (not copying) my original posting, the challenge of
> >> >> > effective
> >> >> > unstructured learning is to utilize every clue and NOT just go with
> >> >> > static
> >> >> > clusters, etc. This includes temporal as well as positional clues,
> >> >> > information content, etc. PCA does some but certainly not all of
> >> >> > this,
> >> >> > but
> >> >> > considering that we were talking about clustering here just a
> couple
> >> >> > of
> >> >> > weeks ago, ratcheting up to PCA seems to be at least a step out of
> >> >> > the
> >> >> > basement.
> >> >> >
> >> >> > I think that perhaps I mis-stated or was misunderstood in my
> >> >> > "position".
> >> >> > No
> >> >> > one has "the answer" yet, but given recent work, I think that
> perhaps
> >> >> > the
> >> >> > problem can now be stated. Given a problem statement, it
> (hopefully)
> >> >> > should
> >> >> > be "just some math" to zero in on the solution. OK...
> >> >> >
> >> >> > Problem Statement: What are the optimal functions, derived from
> >> >> > real-world
> >> >> > observations of past events, the timings of their comings and
> goings,
> >> >> > and
> >> >> > perhaps their physical association, to extract each successive
> >> >> > parameter
> >> >> > containing the maximum amount of information (in a Shannon sense)
> >> >> > usable
> >> >> > in
> >> >> > reconstructing the observed inputs. IMHO these same functions will
> be
> >> >> > exactly what you need to recognize what is happening in the world,
> >> >> > what
> >> >> > you
> >> >> > need to act upon, which actions will have the most effect on the
> >> >> > world,
> >> >> > etc.
> >> >> > PCA is clearly NOT there (e.g. it lacks temporal consideration),
> but
> >> >> > seems
> >> >> > to be a step closer than anything else on the horizon. Hopefully,
> >> >> > given
> >> >> > the
> >> >> > "hint" of PCA, we can follow the path.
> >> >> >
> >> >> > You should find an explanation of PCA in any elementary linear
> >> >> > algebra
> >> >> > or
> >> >> > statistics textbook. It has a range of applications (like any
> >> >> > transform),
> >> >> > but it might be best regarded as an/the elementary algorithm for
> >> >> > unsupervised dimension reduction.
> >> >> >
> >> >> > Bingo! However, it still fails to consider temporal clues, unless
> of
> >> >> > course
> >> >> > you just consider these to be another dimension.
> >> >> >
> >> >> > When PCA works, it is more likely to be interpreted as a comment on
> >> >> > the
> >> >> > underlying simplicity of the original dataset, rather than the
> power
> >> >> > of
> >> >> > PCA
> >> >> > itself.
> >> >> >
> >> >> > Agreed, but so far, I haven't seen any solid evidence that the
> world
> >> >> > is
> >> >> > NOT
> >> >> > simple, though it appears pretty complex until you understand it.
> >> >> >
> >> >> > Thanks for making me clarify my thoughts.
> >> >> >
> >> >> > Steve Richfield
> >> >> >
> >> >> > ________________________________
> >> >> > agi | Archives | Modify Your Subscription
> >> >>
> >> >>
> >> >> -------------------------------------------
> >> >> agi
> >> >> Archives: https://www.listbox.com/member/archive/303/=now
> >> >> RSS Feed: https://www.listbox.com/member/archive/rss/303/
> >> >> Modify Your Subscription: https://www.listbox.com/member/?&;
> >> >> Powered by Listbox: http://www.listbox.com
> >> >
> >> > ________________________________
> >> > agi | Archives | Modify Your Subscription
> >>
> >>
> >> -------------------------------------------
> >> agi
> >> Archives: https://www.listbox.com/member/archive/303/=now
> >> RSS Feed: https://www.listbox.com/member/archive/rss/303/
> >> Modify Your Subscription: https://www.listbox.com/member/?&;
> >> Powered by Listbox: http://www.listbox.com
> >
> > ________________________________
> > agi | Archives | Modify Your Subscription
>
>
> -------------------------------------------
> agi
> Archives: https://www.listbox.com/member/archive/303/=now
> RSS Feed: https://www.listbox.com/member/archive/rss/303/
> Modify Your Subscription:
> https://www.listbox.com/member/?&;
> Powered by Listbox: http://www.listbox.com
>



-------------------------------------------
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244&id_secret=108809214-a0d121
Powered by Listbox: http://www.listbox.com

Re: [agi] Computing's coming Theory of Everything

Reply via email to