Re: [agi] Introducing Steve's "Theory of Everything" in cognition.

Eric Burton Sun, 04 Jan 2009 08:32:58 -0800

I apologise for an off topic response.

I understand threads advancing theory are valuable in practice


On 1/3/09, Steve Richfield <[email protected]> wrote:
> Abram,
>
> The SparceDBN article you referenced reminds me that I should contact
> Babelfish and propose a math-to-English translation option. Here were some
> simple concepts obfuscated by notation.
>
> I think you are saying that these guys have a really good learning
> algorithm, and I have figured out how to make such things FAST, so that
> together, these methods should about equal natural capabilities.
>
> Continuing with your comments...
>
> On 1/2/09, Abram Demski <[email protected]> wrote:
>>
>> Steve,
>>
>> I'm thinking that you are taking "understanding" to mean something
>> like "identifying the *actual* hidden variables responsible for the
>> pattern, and finding the *actual* state of that variable".
>> Probabilistic models instead *invent* hidden variables, that happen to
>> help explain the data. Is that about right? If so, then explaining
>> what I mean by "functionally equivalent" will help. Here is an
>> example: suppose that we are looking at data concerning a set of
>> chemical experiments. Suppose that the experimental conditions are not
>> very well-controlled, so that interesting hidden variables are
>> present. Suppose that two of these are temperature and air pressure,
>> but that the two have the same effect on the experiment. Then the
>> unsupervised learning will have no way of distinguishing between the
>> two, so it will only find one hidden variable representing them. So,
>> they are functionally equivalent.
>
>
> OK.
>
> This implies that, in the absence of further information, the best
>> thing we can do to try to "understand" the data is to
>> probabilistically model it.
>
>
> OK.
>
> Or perhaps when you say "understanding" it is short for "understanding
>> the implications of", ie, in an already-present model. In that case,
>> perhaps we could separate the quality of predictions from the speed of
>> predictions. A complicated-but-accurate model is useless if we can't
>> calculate the information we need quickly enough.
>
>
> I suspect that when better understandings are had, that something will
> emerge that is both fast AND accurate. Hence, I am resistant to choosing
> unless/until forced to do so.
>
> So, we also want an
>> "understandable" model: one that doesn't take too long to create
>> predictions. This would be different than looking for the best
>> probabilistic model in terms of prediction accuracy.
>
>
> Possible, but not shown to be so.
>
> On the other
>> hand, it is irrelevant in (practically?) all neural-network style
>> approaches today, because the model size is fixed anyway.
>
>
> I'm not sure I see what you are saying here. Until you run out of memory,
> model size is completely variable.
>
> If the output is being fed to humans rather than further along the
>> network, as in the conference example, the situation is very
>> different. Human-readability becomes an issue. This paper is a good
>> example of an approach that creates better human-readability rather
>> than better performance:
>>
>> http://www.stanford.edu/~hllee/nips07-sparseDBN.pdf
>>
>> The altered algorithm also seems to have a performance that matches
>> more closely with statistical analysis of the
>
>
> stray cat's
>
> brain (which was the
>> research goal), suggesting a correlation between human-readability and
>> actual performance gains (since the brain wouldn't do it if it were a
>> bad idea). In a probabilistic framework this is represented best by a
>> prior bias for simplicity.
>
>
> Here, everything boils down to the meaning of "simplicity", e.g. does it
> mean minimum energy RBM, or something else that is probably fairly similar.
>
> Perhaps we should discuss the a priori knowledge issue from my prior
> posting, as I suspect that some of that bears upon "simplicity".
>
> Thanks again for staying with me on this. I think we are gradually making
> some real progress here.
>
> Steve Richfield
> =====================
>
>>
>> On Fri, Jan 2, 2009 at 1:36 PM, Steve Richfield
>> <[email protected]> wrote:
>> > Abram,
>> >
>> > Oh dammitall, I'm going to have to expose the vast extent of my
>> > profound ignorance to respond. Oh well...
>> >
>> > On 1/1/09, Abram Demski <[email protected]> wrote:
>> >>
>> >> Steve,
>> >>
>> >> Sorry for not responding for a little while. Comments follow:
>> >>
>> >> >>
>> >> >> PCA attempts to isolate components that give maximum
>> >> >> information... so my question to you becomes, do you think that the
>> >> >> problem you're pointing towards is suboptimal models that don't
>> >> >> predict the data well enough, or models that predict the data fine
>> but
>> >> >> aren't directly useful for what you expect them to be useful for?
>> >> >
>> >> >
>> >> > Since prediction is NOT the goal, but rather just a useful measure, I
>> am
>> >> > only interested in recognizing
>> >> > that which can be recognized, and NOT in expending resources on
>> >> > "understanding" semi-random noise.
>> >> > Further, since compression is NOT my goal, I am not interested in
>> >> > combining
>> >> > features
>> >> > in ways that minimize the number of components. In short, there is a
>> lot
>> >> > to
>> >> > be learned from PCA,
>> >> > but a "perfect" PCA solution is likely a less-than-perfect NN
>> solution.
>> >>
>> >> What I am saying is this: a good predictive model will predict
>> >> whatever is desired. Unsupervised learning attempts to find such a
>> >> model. But, a good predictive model will probably predict lots of
>> >> stuff we aren't particularly interested in, so supervised methods have
>> >> been invented to predict single variables when those variables are of
>> >> interest. Still, in principle, we could use unsupervised methods.
>> >> Furthermore (as I understand it), if we are dealing with lots of
>> >> variables and believe deep patterns are present, unsupervised learning
>> >> can outperform supervised learning by grabbing onto patterns that may
>> >> ultimately lead to the desired result, which supervised learning would
>> >> miss because no immediate value was evident. But, anyway, my point is
>> >> that I can only see two meanings for the word "goodness":
>> >>
>> >> --usefulness in predicting the data as a whole
>> >> --usefulness in predicting reward in particular (the real goal)
>> >
>> >
>> > I'm still hung up on "predicting", which may indeed be the best measure
>> of
>> > value, but AGI efforts need understanding, which is subtly different.
>> > OK,
>> so
>> > what is the difference?
>> >
>> > The tree of reality has many branches in the future - there are many
>> > possible futures. "Understanding" is the process of keeping track of
>> which
>> > branch you are on, while "predicting" is taking shots at which branch
>> will
>> > prevail. One may necessarily involve the other. Has anyone thought
>> > this through yet?
>> >>
>> >> (Actually, I can think of a third: usefulness in *getting* reward (ie,
>> >> motor control). But, I feel adding that to the discussion would be
>> >> premature... there are interesting issues, but they are separate from
>> >> the ones being discussed here...)
>> >>
>> >> >>
>> >> >> To that end... you weren't talking about using the *predictions* of
>> >> >> the PCA model, but rather the principle components themselves. The
>> >> >> components are essentially hidden variables to make the model run.
>> >> >
>> >> >
>> >> > ... or variables smushed together in ways that may work well for
>> >> > compression, but poorly for recognition.
>> >>
>> >> What are the variables that you keep worrying might be smushed
>> >> together? Can you give an example?
>> >
>> >
>> > I thought I could, but then I ran into problems as you discussed below.
>> >>
>> >> If PCA smushes variables together,
>> >> that suggests 1 of 3 things:
>> >>
>> >> --PCA found suboptimal components
>> >
>> >
>> > Here, I am hung up on "found". This implies a multitude of "solutions",
>> yet
>> > there are guys out there who are beating on the matrix manipulations to
>> > "solve" PCA. Is this like non-zero-sum game theory, where there can be
>> many
>> > solutions, some better than others?
>> >>
>> >> --PCA found optimal components, but the hidden variables that got
>> >> smooshed really are functionally equivalent (when looked at through
>> >> the lens of the available visible variables)
>> >
>> >
>> > Here, I am hung up on "functionally". This presumes supervised learning
>> or
>> > divine observation.
>> >>
>> >> --The true probabilistic situation violates the probabilistic
>> >> assumptions behind PCA
>> >>
>> >> The third option is by far the most probable, I think.
>> >
>> >
>> > That's where I got stuck trying to come up with an example.
>> >>
>> >> >>
>> >> >> or in an attempt to complexify the model to make it more accurate in
>> >> >> its predictions, by looking for links between the hidden variables,
>> or
>> >> >> patterns over time, et cetera.
>> >> >
>> >> >
>> >> > Setting predictions aside, the next layer of PCA-like neurons would
>> >> > be
>> >> > looking for those links.
>> >>
>> >> Absolutely.
>> >
>> >
>> > More on my ignorance...
>> >
>> > I and PCA hadn't really "connected" until a few months ago, when I
>> attended
>> > a computer conference and listened to several presentations. The
>> (possibly
>> > false, at least in some instances) impression I got was that the
>> presenters
>> > didn't really understand some/many of the "components" that they were
>> > finding. One video compression presenter did identify the first few, but
>> > admittedly failed to identify later components.
>> > I can see that this process necessarily involves a tiny amount of a
>> priori
>> > information, specifically, knowledge of:
>> > 1.  The physical extent of features, e.g. as controlled by mutual
>> > inhibition.
>> > 2.  The threshold for feature recognition, e.g. the number of active
>> > synapses that must be involved for a feature to be interesting.
>> > 3.  The acceptable "fuzziness" of recognition, e.g. just how accurately
>> must
>> > a feature match its "pattern".
>> > 4.  ??? What have I missed in this list?
>> > 5.  Some or all of the above may be calculable based on ???
>> >
>> > Thanks for your help.
>> >
>> > Steve Richfield
>> >
>> > ________________________________
>> > agi | Archives | Modify Your Subscription
>>
>>
>>
>> --
>> Abram Demski
>> Public address: [email protected]
>> Public archive: http://groups.google.com/group/abram-demski
>> Private address: [email protected]
>>
>>
>> -------------------------------------------
>> agi
>> Archives: https://www.listbox.com/member/archive/303/=now
>> RSS Feed: https://www.listbox.com/member/archive/rss/303/
>> Modify Your Subscription:
>> https://www.listbox.com/member/?&;
>> Powered by Listbox: http://www.listbox.com
>>
>
>
>
> -------------------------------------------
> agi
> Archives: https://www.listbox.com/member/archive/303/=now
> RSS Feed: https://www.listbox.com/member/archive/rss/303/
> Modify Your Subscription:
> https://www.listbox.com/member/?&;
> Powered by Listbox: http://www.listbox.com
>


-------------------------------------------
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244&id_secret=123753653-47f84b
Powered by Listbox: http://www.listbox.com

Re: [agi] Introducing Steve's "Theory of Everything" in cognition.

Reply via email to