I apologise for an off topic response. I understand threads advancing theory are valuable in practice
On 1/3/09, Steve Richfield <[email protected]> wrote: > Abram, > > The SparceDBN article you referenced reminds me that I should contact > Babelfish and propose a math-to-English translation option. Here were some > simple concepts obfuscated by notation. > > I think you are saying that these guys have a really good learning > algorithm, and I have figured out how to make such things FAST, so that > together, these methods should about equal natural capabilities. > > Continuing with your comments... > > On 1/2/09, Abram Demski <[email protected]> wrote: >> >> Steve, >> >> I'm thinking that you are taking "understanding" to mean something >> like "identifying the *actual* hidden variables responsible for the >> pattern, and finding the *actual* state of that variable". >> Probabilistic models instead *invent* hidden variables, that happen to >> help explain the data. Is that about right? If so, then explaining >> what I mean by "functionally equivalent" will help. Here is an >> example: suppose that we are looking at data concerning a set of >> chemical experiments. Suppose that the experimental conditions are not >> very well-controlled, so that interesting hidden variables are >> present. Suppose that two of these are temperature and air pressure, >> but that the two have the same effect on the experiment. Then the >> unsupervised learning will have no way of distinguishing between the >> two, so it will only find one hidden variable representing them. So, >> they are functionally equivalent. > > > OK. > > This implies that, in the absence of further information, the best >> thing we can do to try to "understand" the data is to >> probabilistically model it. > > > OK. > > Or perhaps when you say "understanding" it is short for "understanding >> the implications of", ie, in an already-present model. In that case, >> perhaps we could separate the quality of predictions from the speed of >> predictions. A complicated-but-accurate model is useless if we can't >> calculate the information we need quickly enough. > > > I suspect that when better understandings are had, that something will > emerge that is both fast AND accurate. Hence, I am resistant to choosing > unless/until forced to do so. > > So, we also want an >> "understandable" model: one that doesn't take too long to create >> predictions. This would be different than looking for the best >> probabilistic model in terms of prediction accuracy. > > > Possible, but not shown to be so. > > On the other >> hand, it is irrelevant in (practically?) all neural-network style >> approaches today, because the model size is fixed anyway. > > > I'm not sure I see what you are saying here. Until you run out of memory, > model size is completely variable. > > If the output is being fed to humans rather than further along the >> network, as in the conference example, the situation is very >> different. Human-readability becomes an issue. This paper is a good >> example of an approach that creates better human-readability rather >> than better performance: >> >> http://www.stanford.edu/~hllee/nips07-sparseDBN.pdf >> >> The altered algorithm also seems to have a performance that matches >> more closely with statistical analysis of the > > > stray cat's > > brain (which was the >> research goal), suggesting a correlation between human-readability and >> actual performance gains (since the brain wouldn't do it if it were a >> bad idea). In a probabilistic framework this is represented best by a >> prior bias for simplicity. > > > Here, everything boils down to the meaning of "simplicity", e.g. does it > mean minimum energy RBM, or something else that is probably fairly similar. > > Perhaps we should discuss the a priori knowledge issue from my prior > posting, as I suspect that some of that bears upon "simplicity". > > Thanks again for staying with me on this. I think we are gradually making > some real progress here. > > Steve Richfield > ===================== > >> >> On Fri, Jan 2, 2009 at 1:36 PM, Steve Richfield >> <[email protected]> wrote: >> > Abram, >> > >> > Oh dammitall, I'm going to have to expose the vast extent of my >> > profound ignorance to respond. Oh well... >> > >> > On 1/1/09, Abram Demski <[email protected]> wrote: >> >> >> >> Steve, >> >> >> >> Sorry for not responding for a little while. Comments follow: >> >> >> >> >> >> >> >> PCA attempts to isolate components that give maximum >> >> >> information... so my question to you becomes, do you think that the >> >> >> problem you're pointing towards is suboptimal models that don't >> >> >> predict the data well enough, or models that predict the data fine >> but >> >> >> aren't directly useful for what you expect them to be useful for? >> >> > >> >> > >> >> > Since prediction is NOT the goal, but rather just a useful measure, I >> am >> >> > only interested in recognizing >> >> > that which can be recognized, and NOT in expending resources on >> >> > "understanding" semi-random noise. >> >> > Further, since compression is NOT my goal, I am not interested in >> >> > combining >> >> > features >> >> > in ways that minimize the number of components. In short, there is a >> lot >> >> > to >> >> > be learned from PCA, >> >> > but a "perfect" PCA solution is likely a less-than-perfect NN >> solution. >> >> >> >> What I am saying is this: a good predictive model will predict >> >> whatever is desired. Unsupervised learning attempts to find such a >> >> model. But, a good predictive model will probably predict lots of >> >> stuff we aren't particularly interested in, so supervised methods have >> >> been invented to predict single variables when those variables are of >> >> interest. Still, in principle, we could use unsupervised methods. >> >> Furthermore (as I understand it), if we are dealing with lots of >> >> variables and believe deep patterns are present, unsupervised learning >> >> can outperform supervised learning by grabbing onto patterns that may >> >> ultimately lead to the desired result, which supervised learning would >> >> miss because no immediate value was evident. But, anyway, my point is >> >> that I can only see two meanings for the word "goodness": >> >> >> >> --usefulness in predicting the data as a whole >> >> --usefulness in predicting reward in particular (the real goal) >> > >> > >> > I'm still hung up on "predicting", which may indeed be the best measure >> of >> > value, but AGI efforts need understanding, which is subtly different. >> > OK, >> so >> > what is the difference? >> > >> > The tree of reality has many branches in the future - there are many >> > possible futures. "Understanding" is the process of keeping track of >> which >> > branch you are on, while "predicting" is taking shots at which branch >> will >> > prevail. One may necessarily involve the other. Has anyone thought >> > this through yet? >> >> >> >> (Actually, I can think of a third: usefulness in *getting* reward (ie, >> >> motor control). But, I feel adding that to the discussion would be >> >> premature... there are interesting issues, but they are separate from >> >> the ones being discussed here...) >> >> >> >> >> >> >> >> To that end... you weren't talking about using the *predictions* of >> >> >> the PCA model, but rather the principle components themselves. The >> >> >> components are essentially hidden variables to make the model run. >> >> > >> >> > >> >> > ... or variables smushed together in ways that may work well for >> >> > compression, but poorly for recognition. >> >> >> >> What are the variables that you keep worrying might be smushed >> >> together? Can you give an example? >> > >> > >> > I thought I could, but then I ran into problems as you discussed below. >> >> >> >> If PCA smushes variables together, >> >> that suggests 1 of 3 things: >> >> >> >> --PCA found suboptimal components >> > >> > >> > Here, I am hung up on "found". This implies a multitude of "solutions", >> yet >> > there are guys out there who are beating on the matrix manipulations to >> > "solve" PCA. Is this like non-zero-sum game theory, where there can be >> many >> > solutions, some better than others? >> >> >> >> --PCA found optimal components, but the hidden variables that got >> >> smooshed really are functionally equivalent (when looked at through >> >> the lens of the available visible variables) >> > >> > >> > Here, I am hung up on "functionally". This presumes supervised learning >> or >> > divine observation. >> >> >> >> --The true probabilistic situation violates the probabilistic >> >> assumptions behind PCA >> >> >> >> The third option is by far the most probable, I think. >> > >> > >> > That's where I got stuck trying to come up with an example. >> >> >> >> >> >> >> >> or in an attempt to complexify the model to make it more accurate in >> >> >> its predictions, by looking for links between the hidden variables, >> or >> >> >> patterns over time, et cetera. >> >> > >> >> > >> >> > Setting predictions aside, the next layer of PCA-like neurons would >> >> > be >> >> > looking for those links. >> >> >> >> Absolutely. >> > >> > >> > More on my ignorance... >> > >> > I and PCA hadn't really "connected" until a few months ago, when I >> attended >> > a computer conference and listened to several presentations. The >> (possibly >> > false, at least in some instances) impression I got was that the >> presenters >> > didn't really understand some/many of the "components" that they were >> > finding. One video compression presenter did identify the first few, but >> > admittedly failed to identify later components. >> > I can see that this process necessarily involves a tiny amount of a >> priori >> > information, specifically, knowledge of: >> > 1. The physical extent of features, e.g. as controlled by mutual >> > inhibition. >> > 2. The threshold for feature recognition, e.g. the number of active >> > synapses that must be involved for a feature to be interesting. >> > 3. The acceptable "fuzziness" of recognition, e.g. just how accurately >> must >> > a feature match its "pattern". >> > 4. ??? What have I missed in this list? >> > 5. Some or all of the above may be calculable based on ??? >> > >> > Thanks for your help. >> > >> > Steve Richfield >> > >> > ________________________________ >> > agi | Archives | Modify Your Subscription >> >> >> >> -- >> Abram Demski >> Public address: [email protected] >> Public archive: http://groups.google.com/group/abram-demski >> Private address: [email protected] >> >> >> ------------------------------------------- >> agi >> Archives: https://www.listbox.com/member/archive/303/=now >> RSS Feed: https://www.listbox.com/member/archive/rss/303/ >> Modify Your Subscription: >> https://www.listbox.com/member/?& >> Powered by Listbox: http://www.listbox.com >> > > > > ------------------------------------------- > agi > Archives: https://www.listbox.com/member/archive/303/=now > RSS Feed: https://www.listbox.com/member/archive/rss/303/ > Modify Your Subscription: > https://www.listbox.com/member/?& > Powered by Listbox: http://www.listbox.com > ------------------------------------------- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244&id_secret=123753653-47f84b Powered by Listbox: http://www.listbox.com
