Re: [agi] Introducing Steve's "Theory of Everything" in cognition.

Steve Richfield Sat, 03 Jan 2009 17:49:00 -0800

Abram,

The SparceDBN article you referenced reminds me that I should contact
Babelfish and propose a math-to-English translation option. Here were some
simple concepts obfuscated by notation.


I think you are saying that these guys have a really good learning
algorithm, and I have figured out how to make such things FAST, so that
together, these methods should about equal natural capabilities.

Continuing with your comments...

On 1/2/09, Abram Demski <[email protected]> wrote:
>
> Steve,
>
> I'm thinking that you are taking "understanding" to mean something
> like "identifying the *actual* hidden variables responsible for the
> pattern, and finding the *actual* state of that variable".
> Probabilistic models instead *invent* hidden variables, that happen to
> help explain the data. Is that about right? If so, then explaining
> what I mean by "functionally equivalent" will help. Here is an
> example: suppose that we are looking at data concerning a set of
> chemical experiments. Suppose that the experimental conditions are not
> very well-controlled, so that interesting hidden variables are
> present. Suppose that two of these are temperature and air pressure,
> but that the two have the same effect on the experiment. Then the
> unsupervised learning will have no way of distinguishing between the
> two, so it will only find one hidden variable representing them. So,
> they are functionally equivalent.


OK.

This implies that, in the absence of further information, the best
> thing we can do to try to "understand" the data is to
> probabilistically model it.


OK.

Or perhaps when you say "understanding" it is short for "understanding
> the implications of", ie, in an already-present model. In that case,
> perhaps we could separate the quality of predictions from the speed of
> predictions. A complicated-but-accurate model is useless if we can't
> calculate the information we need quickly enough.


I suspect that when better understandings are had, that something will
emerge that is both fast AND accurate. Hence, I am resistant to choosing
unless/until forced to do so.

So, we also want an
> "understandable" model: one that doesn't take too long to create
> predictions. This would be different than looking for the best
> probabilistic model in terms of prediction accuracy.


Possible, but not shown to be so.

On the other
> hand, it is irrelevant in (practically?) all neural-network style
> approaches today, because the model size is fixed anyway.


I'm not sure I see what you are saying here. Until you run out of memory,
model size is completely variable.

If the output is being fed to humans rather than further along the
> network, as in the conference example, the situation is very
> different. Human-readability becomes an issue. This paper is a good
> example of an approach that creates better human-readability rather
> than better performance:
>
> http://www.stanford.edu/~hllee/nips07-sparseDBN.pdf
>
> The altered algorithm also seems to have a performance that matches
> more closely with statistical analysis of the


stray cat's

brain (which was the
> research goal), suggesting a correlation between human-readability and
> actual performance gains (since the brain wouldn't do it if it were a
> bad idea). In a probabilistic framework this is represented best by a
> prior bias for simplicity.


Here, everything boils down to the meaning of "simplicity", e.g. does it
mean minimum energy RBM, or something else that is probably fairly similar.

Perhaps we should discuss the a priori knowledge issue from my prior
posting, as I suspect that some of that bears upon "simplicity".

Thanks again for staying with me on this. I think we are gradually making
some real progress here.

Steve Richfield
=====================

>
> On Fri, Jan 2, 2009 at 1:36 PM, Steve Richfield
> <[email protected]> wrote:
> > Abram,
> >
> > Oh dammitall, I'm going to have to expose the vast extent of my
> > profound ignorance to respond. Oh well...
> >
> > On 1/1/09, Abram Demski <[email protected]> wrote:
> >>
> >> Steve,
> >>
> >> Sorry for not responding for a little while. Comments follow:
> >>
> >> >>
> >> >> PCA attempts to isolate components that give maximum
> >> >> information... so my question to you becomes, do you think that the
> >> >> problem you're pointing towards is suboptimal models that don't
> >> >> predict the data well enough, or models that predict the data fine
> but
> >> >> aren't directly useful for what you expect them to be useful for?
> >> >
> >> >
> >> > Since prediction is NOT the goal, but rather just a useful measure, I
> am
> >> > only interested in recognizing
> >> > that which can be recognized, and NOT in expending resources on
> >> > "understanding" semi-random noise.
> >> > Further, since compression is NOT my goal, I am not interested in
> >> > combining
> >> > features
> >> > in ways that minimize the number of components. In short, there is a
> lot
> >> > to
> >> > be learned from PCA,
> >> > but a "perfect" PCA solution is likely a less-than-perfect NN
> solution.
> >>
> >> What I am saying is this: a good predictive model will predict
> >> whatever is desired. Unsupervised learning attempts to find such a
> >> model. But, a good predictive model will probably predict lots of
> >> stuff we aren't particularly interested in, so supervised methods have
> >> been invented to predict single variables when those variables are of
> >> interest. Still, in principle, we could use unsupervised methods.
> >> Furthermore (as I understand it), if we are dealing with lots of
> >> variables and believe deep patterns are present, unsupervised learning
> >> can outperform supervised learning by grabbing onto patterns that may
> >> ultimately lead to the desired result, which supervised learning would
> >> miss because no immediate value was evident. But, anyway, my point is
> >> that I can only see two meanings for the word "goodness":
> >>
> >> --usefulness in predicting the data as a whole
> >> --usefulness in predicting reward in particular (the real goal)
> >
> >
> > I'm still hung up on "predicting", which may indeed be the best measure
> of
> > value, but AGI efforts need understanding, which is subtly different. OK,
> so
> > what is the difference?
> >
> > The tree of reality has many branches in the future - there are many
> > possible futures. "Understanding" is the process of keeping track of
> which
> > branch you are on, while "predicting" is taking shots at which branch
> will
> > prevail. One may necessarily involve the other. Has anyone thought
> > this through yet?
> >>
> >> (Actually, I can think of a third: usefulness in *getting* reward (ie,
> >> motor control). But, I feel adding that to the discussion would be
> >> premature... there are interesting issues, but they are separate from
> >> the ones being discussed here...)
> >>
> >> >>
> >> >> To that end... you weren't talking about using the *predictions* of
> >> >> the PCA model, but rather the principle components themselves. The
> >> >> components are essentially hidden variables to make the model run.
> >> >
> >> >
> >> > ... or variables smushed together in ways that may work well for
> >> > compression, but poorly for recognition.
> >>
> >> What are the variables that you keep worrying might be smushed
> >> together? Can you give an example?
> >
> >
> > I thought I could, but then I ran into problems as you discussed below.
> >>
> >> If PCA smushes variables together,
> >> that suggests 1 of 3 things:
> >>
> >> --PCA found suboptimal components
> >
> >
> > Here, I am hung up on "found". This implies a multitude of "solutions",
> yet
> > there are guys out there who are beating on the matrix manipulations to
> > "solve" PCA. Is this like non-zero-sum game theory, where there can be
> many
> > solutions, some better than others?
> >>
> >> --PCA found optimal components, but the hidden variables that got
> >> smooshed really are functionally equivalent (when looked at through
> >> the lens of the available visible variables)
> >
> >
> > Here, I am hung up on "functionally". This presumes supervised learning
> or
> > divine observation.
> >>
> >> --The true probabilistic situation violates the probabilistic
> >> assumptions behind PCA
> >>
> >> The third option is by far the most probable, I think.
> >
> >
> > That's where I got stuck trying to come up with an example.
> >>
> >> >>
> >> >> or in an attempt to complexify the model to make it more accurate in
> >> >> its predictions, by looking for links between the hidden variables,
> or
> >> >> patterns over time, et cetera.
> >> >
> >> >
> >> > Setting predictions aside, the next layer of PCA-like neurons would be
> >> > looking for those links.
> >>
> >> Absolutely.
> >
> >
> > More on my ignorance...
> >
> > I and PCA hadn't really "connected" until a few months ago, when I
> attended
> > a computer conference and listened to several presentations. The
> (possibly
> > false, at least in some instances) impression I got was that the
> presenters
> > didn't really understand some/many of the "components" that they were
> > finding. One video compression presenter did identify the first few, but
> > admittedly failed to identify later components.
> > I can see that this process necessarily involves a tiny amount of a
> priori
> > information, specifically, knowledge of:
> > 1.  The physical extent of features, e.g. as controlled by mutual
> > inhibition.
> > 2.  The threshold for feature recognition, e.g. the number of active
> > synapses that must be involved for a feature to be interesting.
> > 3.  The acceptable "fuzziness" of recognition, e.g. just how accurately
> must
> > a feature match its "pattern".
> > 4.  ??? What have I missed in this list?
> > 5.  Some or all of the above may be calculable based on ???
> >
> > Thanks for your help.
> >
> > Steve Richfield
> >
> > ________________________________
> > agi | Archives | Modify Your Subscription
>
>
>
> --
> Abram Demski
> Public address: [email protected]
> Public archive: http://groups.google.com/group/abram-demski
> Private address: [email protected]
>
>
> -------------------------------------------
> agi
> Archives: https://www.listbox.com/member/archive/303/=now
> RSS Feed: https://www.listbox.com/member/archive/rss/303/
> Modify Your Subscription:
> https://www.listbox.com/member/?&;
> Powered by Listbox: http://www.listbox.com
>



-------------------------------------------
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244&id_secret=123753653-47f84b
Powered by Listbox: http://www.listbox.com

Re: [agi] Introducing Steve's "Theory of Everything" in cognition.

Reply via email to