Steve, It seems that you misunderstood several of the things that I said. Concerning grammar theory, I am only invoking the basics of the field, so the wikipedia article should be enough if you want to know what I meant by "regular grammar" and about them being only the "first level of the hierarchy". Concerning the rest:
> Is the answer to my question "yes"? No, the answer is no. At least, in the sense that I meant it... >> > By "Markov" you are referring to successive computation processes, e.g. >> > layers of neurons, each feeding the next? I am talking about the underlying probabilistic model, not about how it is implemented in a neural net. > It sounds like you are restricting your "thinking" only to NNs that are > neatly arranged in layers, rather than spaghetti-wired NNs. I consider > layering only to refine form and function, but expect that serious > application will probably come in spaghetti-wired arrays. The implementation is important, but the underlying mathematical model is more important. Hierarchical/layered arrangements may or may not implement a restricted class of model. >> >> Markov models are highly prone to overmatching the >> >> dataset when they become high-order. > > > Agreed, but this is easy to recognize from the lack of lateral inhibition, > and easy to correct by recognizing the coincidence of a lack of lateral > inhibition AND individual qualifying inputs restricting recognition. The > "common knowledge" problem is that with real-world "object oriented" inputs, > it isn't all that simple because nearly everything is active to some degree. > However, dp/dt could often "quiet things down" so that carefully targeted > corrections could easily be made. I do not understand the connection you see between lateral inhibition and detection of overmatching. Also, I don't know what you mean by qualifying inputs. I am ignorant of the "common-knowledge problem" (perhaps because I have not followed neural net literature). I do see that dp/dt would tend to quiet things down, but I don't see what you're saying about using that to prevent overmatching. Overmatching can be prevented by prior bias, though. You had wanted to bring up prior bias before... >> [...] So for discrete data, >> dp/dt will actually make the tables bigger. > > No, because my epineuronal approach proposes simply waiting (while > discarding all data) until a good prototypical input presents itself, and > grabbing THAT (complete with overmatching). Hence, NO MEMORY is needed. The size-of-the-table issue isn't about the amount of memory needed, it is about the amount of data needed to flesh out the model (which is roughly indicated by the size of the table corresponding to the model). Come to think of it, there are two ways to think of nth-order markov-models that occur in derivative space. Comparing them to the non-derivative models of the same order, they have larger state-tables and so require more data. Comparing them to (n+1)-order models in non-d space, though, we can think of d-space models as a way of reducing the size of the table by treating some sequences as identical (specifically, those in which no change takes place). > Again, overmatching should be easily correctable, as it is easy to recognize > and identify the overmatching inputs, at least in some dp/dt cases. Where > "some" isn't satisfied, just wait until one comes along. Part of the problem with overmatching is that there is no way to recognize it from the inside-- well, except with additional prior information. > dp/dt methods promise an escape from this entire mess by sometimes > extracting prototypical cases from highly complex real-world input, and > thereby provide "instant programming" from just a few "patterns", one of > which must pass the various tests of being "prototypical" (only a certain > percentage of inputs are active), contains a "principal component" (no > active lateral inhibition), and be "interesting" (downstream neurons later > utilize it). Failing these tests, throw it back, put your hook back in the > water, and wait for something else to "bite", all while discarding all input > data until you encounter one that "fits". One example of overmatching that might slip past this net: small background movements that are insignificant (to humans) will be considered very prototypical (very low percentage of inputs active), and will contain a principle component (no lateral inhibition because these patterns are very random in nature and so aren't likely to look similar to other patterns). If something like this is coupled (in a single-shot coincidence) with an important event, it could be considered interesting. In short, I think dp/dt sounds useful, but I am not so sure about how you propose to actually use it. It does not seem to be inherently helpful with standard markov models, nor am I particularly convinced that a good criteria can be found for opportunistic instant recognition of principle components. Perhaps one simple, but still possibly important, application of dt/dp would be that there is no need to update neurons that are getting unchanging input (in the case of simulation on non-parallel machines). For real-time applications, one could make the assumption that only a limited amount of change will be registered each frame. This would allow neural nets to be larger then the size that the processor could really handle. Some ad-hoc recovery method could be used when the assumption was broken (sensory overload). --Abram On Thu, Jan 8, 2009 at 8:04 PM, Steve Richfield <[email protected]> wrote: > Abram, > > On 1/7/09, Abram Demski <[email protected]> wrote: >> >> Steve, >> >> Dp/dt methods do not fundamentally change the space of possible models >> (if your initial mathematical claim of equivalence is true). > > > The claim is that a given neuron performs the same transformation, whether > on "object oriented" signals or dp/dt signals. The whole idea of dp/dt is > that learning can be very much improved. >> >> What I am >> saying is that that model space is *far* too small. Perhaps you know >> some grammar theory? > > > My background is a bit unusual. I got into CS before the term even existed. > My first two computers were electromechanical. The next two used vacuum > tubes. At one point, I had read substantially everything then written about > computers. Hence, I have never attended a CS course, but rather, I picked > this stuff up as I went along. > > Hence, for most questions about what I know, the answer is usually that I > know many parts of what is taught in courses and appears in textbooks, but > that there is also much about "conventional wisdom" that I do NOT know. > Hence, the best advice is to make NO assumptions about what I already > "know". > > Note that my somewhat unusual background allows me to take really fresh > looks at things. This served me well during my long career as a computer and > electronics consultant, and I hope it serves me well here. >> >> Markov models are not even as expressive as >> regular grammars. Hidden markov models are. But there is a long way to >> go from there, since that is just the first level of the hierarchy. > > > It sounds like you are restricting your "thinking" only to NNs that are > neatly arranged in layers, rather than spaghetti-wired NNs. I consider > layering only to refine form and function, but expect that serious > application will probably come in spaghetti-wired arrays. >> >> > By "Markov" you are referring to successive computation processes, e.g. >> > layers of neurons, each feeding the next? >> >> For sequential data, an Nth-order markov model is a model that >> predicts the next item in the sequence from the last N items. These >> can be built by making an n-dimensional table, and running through the >> data to count what item appears after each occurrence of each n-item >> subsequence. Equivalently, an nth-order markov model might store the >> probability (/frequency) of each possible sequence of length N+1; in >> that case we've got to do some extra calculations to get predictions >> out of the model, but mathematically speaking, we've got the same >> information in our hands. Markov models can be extended to spatial >> data by counting the probabilities of (all possible) squares of some >> fixed size. (Circles would work fine too.) > > > Is the answer to my question "yes"? >> >> >> Markov models are highly prone to overmatching the >> >> dataset when they become high-order. > > > Agreed, but this is easy to recognize from the lack of lateral inhibition, > and easy to correct by recognizing the coincidence of a lack of lateral > inhibition AND individual qualifying inputs restricting recognition. The > "common knowledge" problem is that with real-world "object oriented" inputs, > it isn't all that simple because nearly everything is active to some degree. > However, dp/dt could often "quiet things down" so that carefully targeted > corrections could easily be made. >> >> > Only because the principal components haven't been accurately sorted out >> > by >> > dp/dt methods? >> >> The reason that overmatching becomes a problem is that the size of the >> table grows exponentially with N. There is simply not enough data to >> fill the table properly. Let's see... where normal methods would give >> a variable values 1 or 0, derivatives would allow 1, 0, and -1 >> (positive change, no change, negative change). So for discrete data, >> dp/dt will actually make the tables bigger. > > > No, because my epineuronal approach proposes simply waiting (while > discarding all data) until a good prototypical input presents itself, and > grabbing THAT (complete with overmatching). Hence, NO MEMORY is needed. >> >> This could improve >> discrimination for low-order models (similar to the effect if >> increasing the order), but it will make overmatching worse for >> higher-order models (again, similar to the effect of increasing the >> order). > > > Again, overmatching should be easily correctable, as it is easy to recognize > and identify the overmatching inputs, at least in some dp/dt cases. Where > "some" isn't satisfied, just wait until one comes along. >> >> of course, there is an added bonus if the data's regularity really is >> represented better by the derivatives. > > > Better than "better", it appears to provide occasional nearly-perfect > extraction of prototypical cases, appropriate for instant learning, which is > the whole point of epineuronal programming. >> >> Come to think of it, it shouldn't be surprising that working in >> derivative space is like increasing the order... each unit of >> derivative-data represents (the difference between) two units of >> normal data. > > > If the real world were "continuous" in a mathematical sense, a sort of soup > without edges, boundaries, structure, etc., then you would be on the right > track here. However, it does have "edges" and "corners" of countless > variety, and in dp/dt space, sometimes/often interesting objects move while > the background remains stationary, thereby momentarily extracting "objects" > with all their features separated from their surroundings - something that > can't happen outside of dp/dt space. > > Perhaps you learned about early perceptron experiments, where they taught > them with prototypical inputs, e.g. visual inputs with black figures on > white cards? Of course this worked great for demos, but was unworkable using > real-world patterns for learning. My hope/expectation is that dp/dt can put > things back into similar simplistic learning, but using real-world inputs. > In short, I see dp/dt as a sort of mathematical "trick" to return to the > simplicity and instantaneous learning of early perceptrons. > > My impression here is that this entire field has become "hung up" on > using probabilistic methods, knowing full well that they don't work well > enough to utilize in practical AGI/NN systems for very fundamental reasons. > dp/dt methods promise an escape from this entire mess by sometimes > extracting prototypical cases from highly complex real-world input, and > thereby provide "instant programming" from just a few "patterns", one of > which must pass the various tests of being "prototypical" (only a certain > percentage of inputs are active), contains a "principal component" (no > active lateral inhibition), and be "interesting" (downstream neurons later > utilize it). Failing these tests, throw it back, put your hook back in the > water, and wait for something else to "bite", all while discarding all input > data until you encounter one that "fits". > > In short, dp/dt looks like a whole new game, with new opportunities that > would be COMPLETELY unworkable outside of dp/dt space. However, this "new" > game is really VERY old, some of it coming from the very early days of > perceptrons. > > Steve Richfield > =============== >> >> On Wed, Jan 7, 2009 at 1:40 PM, Steve Richfield >> <[email protected]> wrote: >> > Abram, >> > >> > On 1/6/09, Abram Demski <[email protected]> wrote: >> >> >> >> Well, I *still* think you are wasting your time with "flat" >> >> (propositional) learning. >> > >> > >> > I'm not at all sure that I understand what you are saying here, so some >> > elaboration is probably in order. >> >> >> >> I'm not saying there isn't still progress to >> >> be made in this area, but I just don't see it as an area where >> >> progress is critical. >> > >> > >> > My guess is that the poor performance of non dp/dt methods is >> > depressing, so >> > everyone wants to look elsewhere. Damn that yellow stuff, I'm looking >> > for >> > SILVER. My hope/expectation is that this field can be supercharged with >> > dp/dt methods. >> >> >> >> The main thing that we can do with propositional >> >> models when we're dealing with relational data is construct >> >> markov-models. >> > >> > >> > By "Markov" you are referring to successive computation processes, e.g. >> > layers of neurons, each feeding the next? >> >> >> >> Markov models are highly prone to overmatching the >> >> dataset when they become high-order. >> > >> > >> > Only because the principal components haven't been accurately sorted out >> > by >> > dp/dt methods? >> >> >> >> So far as I am aware, >> >> improvements to propositional models mainly improve performance for >> >> large numbers of variables, since there isn't much to gain with only a >> >> few variables. >> > >> > >> > Again, hoping that enough redundancy can deal with the overlapping >> > effects >> > of things that occur together, a problem generally eliminated by dp/dt >> > methods. >> >> >> >> (FYI, I don't have much evidence to back up that >> >> claim.) >> > >> > >> > When I finally get this all wrung out, I'll move onto using Eddie's NN >> > platform, that ties into web cams and other complex software or input. >> > Then, >> > we should have lots of real-world testing. BTW, with really fast >> > learning, >> > MUCH larger models can be simulated on the same computers. >> >> >> >> So, I don't think progress on the propositional front directly >> >> translates to progress on the relational front, except in cases where >> >> we have astronomical amounts of data to prevent overmatching. >> > >> > >> > In a sense, dp/dt provides another dimension to sort things out. I am >> > hoping/expecting that LESS dp/dt data is needed this way than with other >> > competing methods. >> >> >> >> Moreover, we need something more than just markov models! >> > >> > >> > The BIG question is: Can we characterize what is needed? >> >> >> >> The transition to hidden-markov-model is not too difficult if we take >> >> the approach of hierarchical temporal memory; but this is still very >> >> simplistic. >> > >> > >> > Most, though certainly not all elegant solutions are simple. Is dp/dt >> > (and >> > corollary methods) "it" or not? THAT is the question. >> >> >> >> Any thoughts about dealing with this? >> > >> > >> > Here, I am hung up on "this". Rather than respond in excruciating detail >> > with a presumption of "this", I'll make the following simplistic >> > statement >> > to get this process started. >> > >> > Simple learning methods have not worked well for reasons you mentioned >> > above. The question here is whether dp/dt methods blow past those >> > limitations in general, and whether epineuronal methods blow past best >> > in >> > particular. >> > >> > Are we on the same page here? >> > >> > Steve Richfield >> >> >> >> On Mon, Jan 5, 2009 at 12:42 PM, Steve Richfield >> >> <[email protected]> wrote: >> >> > Thanks everyone for helping me "wring out" the whole dp/dt thing. Now >> >> > for >> >> > the next part of "Steve's Theory..." >> >> > >> >> > If we look at learning as extracting information from a noisy >> >> > channel, >> >> > in >> >> > which the S/N ratio is usually <<1, but where the S/N ratio is >> >> > sometimes >> >> > very high, the WRONG thing to do is to engage in some sort of slow >> >> > averaging >> >> > process as present slow-learning processes do. This especially when >> >> > dp/dt >> >> > based methods can occationally completely separate (in time) the >> >> > "signal" >> >> > from the "noise". >> >> > >> >> > Instead, it would appear that the best/fastest/cleanest (from an >> >> > information >> >> > theory viewpoint) way to extract the "signal" would be to wait for a >> >> > nearly-perfect low-noise opportunity and simply "latch on" to the >> >> > "principal >> >> > component" therein. >> >> > >> >> > Of course there will still be some noise present, regardless of how >> >> > good >> >> > the >> >> > opportunity, so some sort of successive refinement process using >> >> > future >> >> > "opportunities" could further trim NN synapses, edit AGI terms, etc. >> >> > In >> >> > short, I see that TWO entirely different learning mechanisms are >> >> > needed, >> >> > one >> >> > to initially latch onto an approximate principal component, and a >> >> > second >> >> > to >> >> > refine that component. >> >> > >> >> > Processes like this have their obvious hazards, like initially >> >> > failing >> >> > to >> >> > incorporate a critical synapse/term, and in the process dooming their >> >> > functionality regardless of refinement. Neurons, principal >> >> > components, >> >> > equations, etc., that turn out to be worthless, or which are >> >> > "refined" >> >> > into >> >> > nothingness, would simply trigger another epineuronal reprogramming >> >> > to >> >> > yet >> >> > another principal component, when a lack of lateral inhibition or >> >> > other >> >> > AGI-equivalent process detects that something is happening that >> >> > nothing >> >> > else >> >> > recognizes. >> >> > >> >> > In short, I am proposing abandoning the sorts of slow learning >> >> > processes >> >> > typical of machine learning, except for use in gradual refinement of >> >> > opportunistic instantly-recognized principal components. >> >> > >> >> > Any thoughts? >> >> > >> >> > Steve Richfield >> >> > >> >> > ________________________________ >> >> > agi | Archives | Modify Your Subscription >> >> >> >> >> >> >> >> -- >> >> Abram Demski >> >> Public address: [email protected] >> >> Public archive: http://groups.google.com/group/abram-demski >> >> Private address: [email protected] >> >> >> >> >> >> ------------------------------------------- >> >> agi >> >> Archives: https://www.listbox.com/member/archive/303/=now >> >> RSS Feed: https://www.listbox.com/member/archive/rss/303/ >> >> Modify Your Subscription: https://www.listbox.com/member/?& >> >> Powered by Listbox: http://www.listbox.com >> > >> > ________________________________ >> > agi | Archives | Modify Your Subscription >> >> >> >> -- >> Abram Demski >> Public address: [email protected] >> Public archive: http://groups.google.com/group/abram-demski >> Private address: [email protected] >> >> >> ------------------------------------------- >> agi >> Archives: https://www.listbox.com/member/archive/303/=now >> RSS Feed: https://www.listbox.com/member/archive/rss/303/ >> Modify Your Subscription: https://www.listbox.com/member/?& >> Powered by Listbox: http://www.listbox.com > > ________________________________ > agi | Archives | Modify Your Subscription -- Abram Demski Public address: [email protected] Public archive: http://groups.google.com/group/abram-demski Private address: [email protected] ------------------------------------------- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244&id_secret=123753653-47f84b Powered by Listbox: http://www.listbox.com
