Re: [agi] Introducing Steve's "Theory of Everything" in cognition.

Abram Demski Fri, 26 Dec 2008 15:58:49 -0800

Steve,

It is strange to claim that prior PhDs will be worthless when what you
are suggesting is that we apply the standard methods to a different
representation. But that is beside the present point. :)


Taking the derivative, or just finite differences, is a useful step in
more ways then one. You are talking about taking differences over
time, but differences over space can used for edge detection,
frequently thought of as the first step in visual processing. More
generally, any transform that makes the data more sparse, or simpler,
seems good-- which is of course what PCA does, and derivatives in
time/space, and also the fourier transform I think. The usefulness of
these transforms springs from underlying regularities in the data.

That's not to say that I don't think some representations are
fundamentally more useful than others-- for example, I know that some
proofs are astronomically larger in 1st-order logic as compared to
2nd-order logic, even in domains where 1st-order logic is
representationally sufficient.

The statement about time correction reminds me of a system called
PURR-PUSS. It is turing-complete in some sense, essentially by
compounding time-delays, but I do not know exactly what sense (ie, a
turing complete *learner* is very different then a turing-complete
*programmable computer*... PURR PUSS uses something inbetween called
"soft teaching" if I recall correctly.)

--Abram

On Fri, Dec 26, 2008 at 3:26 PM, Steve Richfield
<[email protected]> wrote:
> Abram,
>
> On 12/26/08, Abram Demski <[email protected]> wrote:
>>
>> Steve,
>>
>> Richard is right when he says temporal simultaneity is not a
>> sufficient principle.
>
>
> ... and I fully agree. However, we must unfold this thing one piece at a
> time.
>
> Without the dp/dt "trick", there doesn't seem to be any way to make
> unsupervised learning work, and I appear to be the first to stumble onto
> dp/dt. This is a whole new and unexplored world, where the things that
> stymied past unsupervised efforts fall out effortlessly, but some new
> challenges present themselves.
>
>>
>> Suppose you present your system with the
>> following sequences (letters could be substituted for sounds, colors,
>> objects, whatever):
>>
>> ABCABCABCABC...
>>
>> AAABBBAAABBB...
>>
>> ABBAAABBBBAAAAABBBBBB...
>>
>> ABBCCCDDDDEEEEEFFFFFF...
>>
>> ABACABADABACABAEABACABADABACABA...
>>
>> All of these sequences have "concepts" behind them. All of these
>> concepts are immune to temporal-simultaneity-learning (although the
>> first could be learned by temporal adjacency, and the second by
>> temporal adjacency with a delay of 3).
>
>
> The way that wet neurons are built, this is unavoidable! Here is another
> snippet from my paper...
>
>
> Time Correction
>
> Electronics designers routinely use differentiation and integration to
> advance and retard timing. Phase-linear low-pass filters are often used to
> make short delays in a signal, and "peaking" capacitors were used in RTL
> (Resistor Transistor Logic) to differentiate inputs for quicker output.
> Further, wet neurons introduce their own propagation delays from input
> synapse to output synapse. If not somehow corrected, the net effect of this
> is a scrambling of the time that a given signal/node/term represents, which
> if left uncorrected, would result in relating signals together that are
> arbitrarily shifted in time. There seems to be three schools of thought
> regarding this:
>
> No problem. This simply results in considering various things shifted
> arbitrarily in time. When wet neurons learn what works, this will result in
> recognizing time-sequenced phenomena. Arbitrary delays might also do a lot
> for artificial neurons.
>
> Time correction could be instituted, e.g. through Taylor series signal
> extrapolation to in effect remove a neuron's delay, at the cost of
> introducing considerable noise into the result. My own simulations of Taylor
> series extrapolation functions showed that the first derivative may indeed
> help for small corrections, but beyond that, subtle changes in the shape of
> a transition cause wild changes in the extrapolated result, sometimes going
> so far as to produce short bursts of oscillation. Downstream neurons may
> then amplify these problems to produce havoc at the output of the artificial
> neural network.
>
> The method utilized in CRAY computers might be in use, where all delays were
> a precise multiple (of their clock rate) long. This was achieved by using
> interconnecting wires cut to certain specific lengths, even though the
> length may be much longer than actually physically needed to interconnect
> two components. Perhaps wet neurons only come in certain very specific
> delays. There is some laboratory evidence for this, as each section of our
> brains has neurons with similar geometry within the group. This has been
> presumed to be an artifact of evolution and limited DNA space, but may in
> fact be necessary for proper time correction.
>
> No one now knows which of these are in use in wet neurons. However,
> regardless of wet-neuron functionality, artificial neural network
> researchers should be attentive to time correction
>
> Note that #1 above unavoidably solves the time-sequencing puzzle. Introduce
> some integration, and the sequencing can be arbitrarily shifted in time -
> within reasonable limits (seconds, maybe a minute or two).
>
>>
>> The transition to sequence learning is (at least, in my eyes) a
>> transition to relational learning, as opposed to the "flat" learning
>> that PCA is designed for.
>
>
> I suspect that PCA-like methods are at work within neurons, and that
> sequence learning and the like fall out from inter-neuronal connections and
> the associated delays, integration, etc.
>>
>> In other words, completely new methods are
>> required. You already begin that transition by invoking dp/dt, which
>> assumes a temporal aspect to the data...
>>
>> See this blog post for a more full account of my view on the current
>> state of affairs. (It started out as a post about a new algorithm I'd
>> been thinking about, but turned into an essay on the difference
>> between relational methods and "flat" (propositional) methods, and how
>> to bridge the gap. If you're wondering about the title, see the
>> previous post.)
>>
>> http://dragonlogic-ai.blogspot.com/2008/12/back-to-ai-ok-here-we-go.html
>
>
> This blog and this email reflect a common problem with AI-thought. There are
> LOTS of things that people are VERY bad at doing, and these generally make
> at the same time horrible examples to test human cognition theories on, and
> wonderful potential AI applications.
>
> A perfect example is health and disease, where the human cognition process
> tends to run in unproductive directions. Any given set of symptoms typically
> has ~12 different common underlying causal mechanisms, each of which has
> several cause-and-effect chain links that are typically arranged in a figure
> "6" configuration with a self-sustaining loop at the end. Given limitless
> understanding, it typically takes two seemingly unrelated actions to
> actually cure anything, one to stop the lead-in, and the other to
> momentarily interrupt the self-sustaining loop.
>
> It is my present suspicion that unsupervised learning is SO simple that it
> just falls out of a system using the right representation. Even the simplest
> of creatures do quite well at it. However, without that representation, it
> is horrifically hard/impossible. This means that NN and AGI guys should all
> STOP whatever they are doing and find the right representation, which is the
> path that I have gone on.
>
> Note that if I am successful, that prior PhDs in AI/NN won't be worth spit
> because they will be built on false premises. Good for history, but bad for
> science.
>
> Thanks for your thoughts. Any more?
>
> Steve Richfield
> ========================
>>
>> On Fri, Dec 26, 2008 at 2:31 AM, Steve Richfield
>> <[email protected]> wrote:
>> > Richard,Richard,
>> >
>> > On 12/25/08, Richard Loosemore <[email protected]> wrote:
>> >>
>> >> Steve Richfield wrote:
>> >>>
>> >>>  There are doubtless exceptions to my broad statement, but generally,
>> >>> neuron functionality is WIDE open to be pretty much ANYTHING you
>> >>> choose,
>> >>> including that of an AGI engine's functionality on its equations.
>> >>>  In the reverse, any NN could be expressed in a shorthand form that
>> >>> contains structure, synapse functions, etc., and an AGI engine could
>> >>> be
>> >>> built/modified to function according to that shorthand.
>> >>>  In short, mapping between NN and AGI forms presumes flexibility in
>> >>> the
>> >>> functionality of the target form. Where that flexibility is NOT
>> >>> present,
>> >>> e.g. because of orthogonal structure, etc., then you must ask whether
>> >>> something is being gained or lost by the difference. Clearly, any
>> >>> transition
>> >>> that involves a loss should be carefully examined to see if the entire
>> >>> effort is headed in the wrong direction, which I think was your
>> >>> original
>> >>> point here.
>> >>
>> >>
>> >> There is a problem here.
>> >>
>> >> When someone says "X and Y can easily be mapped from one form to the
>> >> other" there is an implication that they are NOt suggesting that we go
>> >> right
>> >> down to the basic constituents of both X and Y in order to effect the
>> >> mapping.
>> >>
>> >> Thus:  "Chalk and Cheese can easily be mapped from one to the other"
>> >> ....
>> >> trivially true if we are prepared to go down to the common denominator
>> >> of
>> >> electrons, protons and neutrons.  But if we stay at a sensible level
>> >> then,
>> >> no, these do not map onto one another.
>> >
>> >
>> > The problem here is that you were thinking present existing NN and AGI
>> > systems, neither of which work (yet) in any really useful way, that it
>> > was
>> > obviously impossible to directly convert from one system with its set of
>> > bad
>> > assumptions to another system with a completely different set of bad
>> > assumptions. I completely agree, but I assert that the answer to that
>> > particular question is of no practical interest to anyone.
>> >
>> > On the other hand, converting between NN and AGI systems built on the
>> > SAME
>> > set of assumptions would be simple. This situation doesn't yet exist.
>> > Until
>> > then, converting a program from one dysfunctional platform to another is
>> > uninteresting. When the assumptions get ironed out, then all systems
>> > will be
>> > built on the same assumptions, and there will be few problems going
>> > between
>> > them, EXCEPT:
>> >
>> > Things need to be arranged in arrays for automated learning, which much
>> > more
>> > fits the present NN paradigm than the present AGI paradigm.
>> >>
>> >> Similarly, if you claim that NN and regular AGI map onto one another, I
>> >> assume that you are saying something more substantial than that these
>> >> two
>> >> can both be broken down into their primitive computational parts, and
>> >> that
>> >> when this is done they seem equivalent.
>> >
>> >
>> > Even this breakdown isn't required if both systems are built on the same
>> > correct assumptions. HOWEVER, I see no way to transfer fast learning
>> > from an
>> > NN-like construction to an AGI-like construction. Do you? If there is no
>> > answer to this question, then this unanswerable question would seem to
>> > redirect AGI efforts to NN-like constructions if they are ever to learn
>> > like
>> > we do.
>> >>
>> >> NN and regular AGI, they way they are understood by people who
>> >> understand
>> >> them, have very different styles of constructing intelligent systems.
>> >
>> >
>> > Neither of which work (yet). Of course, we are both trying to fill in
>> > the
>> > gaps.
>> >>
>> >> Sure, you can code both in C, or Lisp, or Cobol, but that is to trash
>> >> the
>> >> real meaning of "are easily mapped onto one another".
>> >
>> >
>> > One of my favorite consulting projects involved coding an AI program to
>> > solve complex problems that were roughly equivalent to solving algebraic
>> > equations. This composed the Yellow pages for 28 different large phone
>> > directories. The project was for a major phone company and had to be
>> > written
>> > entirely in COBOL. Further, it had to run at n log n speed and NOT n^2
>> > speed, which I did by using successive sorts instead of list processing
>> > methods. It would have been rather difficult to achieve the needed
>> > performance in C or Lisp, even though COBOL would seem to be everyone's
>> > first choice as the last choice on the list of prospective platforms.
>> >>>
>> >>>    ), instead of operating on "objects" (in an
>> >>>
>> >>>        object-oriented sense)
>> >>>
>> >>>
>> >>>    Neither NN nor AGI has any intrinsic relationship to OO.
>> >>>
>> >>>  Clearly I need a better term here. Both NNs and AGIs tend to have
>> >>> neurons or equations that reflect the presence (or absence) of various
>> >>> objects, conditions, actions, etc. My fundamental assertion is that if
>> >>> you
>> >>> differentiate the inputs so that everything in the entire network
>> >>> reflects
>> >>> dp/dt instead of straight probabilities, then the network works
>> >>> identically,
>> >>> but learning is GREATLY simplified.
>> >>
>> >> Seems like a simple misunderstanding:  you were not aware that "object
>> >> oriented" does not mean the same as saying that there are fundamental
>> >> atomic
>> >> constituents of a representation.
>> >
>> >
>> > A typical semantic overloading problem. "Atomic consitituent
>> > orientation"
>> > doesn't really work either, because in later stages, individual
>> > terms/neurons can represent entire concepts, strategies, etc. I am still
>> > looking for a good term here.
>> >>>
>> >>>
>> >>>    , instead, operates on the rate-of-changes in the
>> >>>
>> >>>        probabilities of "objects", or dp/dt. Presuming sufficient
>> >>>        bandwidth to generally avoid superstitious coincidences, fast
>> >>>        unsupervised learning then becomes completely trivial, as like
>> >>>        objects cause simultaneous like-patterned changes in the inputs
>> >>>        WITHOUT the overlapping effects of the many other objects
>> >>>        typically present in the input (with numerous minor
>> >>> exceptions).
>> >>>
>> >>>
>> >>>    You have already presumed that something supplies the system with
>> >>>    "objects" that are meaningful.  Even before your first mention of
>> >>>    dp/dt, there has to be a mechanism that is so good that it never
>> >>>    invents objects such as:
>> >>>
>> >>>    Object A:  "A person who once watched all of Tuesday Welds movies
>> >>> in
>> >>>    the space of one week" or
>> >>>
>> >>>    Object B:  "Something that is a combination of Julius Caesar's
>> >>> pinky
>> >>>    toe and a sour grape that Brutus' just spat out" or
>> >>>
>> >>>    Object C:  "All of the molecules involved in a swiming gala that
>> >>>    happen to be 17.36 meters from the last drop of water that splashed
>> >>>    from the pool".
>> >>>
>> >>>    You have supplied no mechanism that is able to do that, but that
>> >>>    mechanism is 90% of the trouble, if learning is what you are about.
>> >>>
>> >>>  With prior unsupervised learning you are 100% correct. However none
>> >>> of
>> >>> the examples you gave involved temporal simultaneity. I will discuss B
>> >>> above
>> >>> because it is close enough to be interesting.
>> >>>  If indeed someone just began to notice something interesting about
>> >>> Caesar's pinkie toe *_as_* they just began to notice the taste of a
>> >>> sour
>> >>> grape, then yes, that probably would be leaned via the mechanisms I am
>> >>> talking about. However, if one was "present perfect tense" while the
>> >>> other
>> >>> was just beginning, then it wouldn't with my approach but would with
>> >>> prior
>> >>> unsupervised learning methods. For example, Caesar's pinkie toe had
>> >>> been
>> >>> noticed and examined, then before the condition passed they tasted a
>> >>> sour
>> >>> grape, then temporal simultaneity of the dp/dt edges wouldn't exist to
>> >>> learn
>> >>> from. Of course, in both cases, the transforms would work identically
>> >>> given
>> >>> identical prior learning/programming.
>> >>
>> >>
>> >> You have not understood the sense in which I made the point, I fear.
>> >
>> >
>> > I think the reverse is true. Consider...
>> >>
>> >> I was describing obviously useless concepts.  Ones where there is no
>> >> temporal simultaneity.
>> >
>> >
>> > dp/dt is unable to even notice things that lack temporal simultaneity,
>> > so
>> > the examples you gave, though typical challenges to past unsupervised
>> > learning, and complete non-issues in dp/dt space.
>> >>
>> >>   Concepts thrown together out of completely useless components.
>> >
>> >
>> > ... that require SOME force/reason/bug/error/etc to get thrown together.
>> > I
>> > think we both understand how this was a typical challenge to past
>> > unsupervised learning efforts. I am asserting that in dp/dt systems,
>> > there
>> > is NO force/reason/error/etc to ever throw such things together, and
>> > hence,
>> > no reason for vastly complex matrix transforms to then try to pull them
>> > back
>> > apart.
>> >>
>> >> The question is:  how to build a mechanism that does NOT fall into the
>> >> trap of creating such nonsense-concepts.  If you just say "assume that
>> >> we
>> >> have such a concept builder" you beg a million questions.
>> >
>> >
>> >> Your reply, above, took one of my examples and tried to talk about what
>> >> could happen if it was not, after all, a nonsense-concept.
>> >
>> > Note that GMAIL got sick here, so I'll mark your text with >. Also, some
>> > replies are deeply indented, so I have bolded some of them.
>> >
>> > I was just "playing neuron" without any mindreading abilities.
>> >>Alas, that is neither here nor there, because (sure enough) *everyone*
>> >> agrees that temporal simultaneity is a good basic ground for trying to
>> >> construct new concepts (it is the Reason Number One for creating a new
>> >> concept!).  But we also know that just common or garden variety
>> >> Temporal
>> >> Simultaneity doesn't get you very far .... that is the easiest of all
>> >> mechanisms, and we need a hundred more concept-building mechanisms that
>> >> are
>> >> better than that before we have a real concept-generating engine.
>> >
>> > Now, we can start "picking through" the approaches. I suspect that
>> > looking
>> > for the principal components of temporally simultaneous inputs goes a
>> > LONG
>> > was toward what is sought, but have no proof (yet). Do YOU have some
>> > idea as
>> > to where the threshold of usefullness is?
>> >
>> >>  And (here is where my point comes back into the picture) if anyone
>> >> stands
>> >> up and says "Hey everyone!  I have discovered a hundred concept
>> >> building
>> >> mechanisms that I think will do the trick!", the first question that
>> >> the
>> >> crowd will ask is:  "Do your mechanisms work together to build real,
>> >> sensible concepts, or do they fill the system with bazillions of really
>> >> dumb, useless concepts (like my nonsense list above)?"
>> >
>> > Clearly, PCA on simultaneous inputs will NOT do that, because they must
>> > show
>> > common things in order not to end up at the wrong end of the Huffman
>> > code.
>> >
>> >> Anyone who says that they know of a way to get unsupervised learning to
>> >> occur is saying, implicitly, that they have those 100 concept building
>> >> mechanisms ready to go (or one super mechanism as good as all of them).
>> >>  Hence my original point:  you cannot simply imply that your system is
>> >> working with bona-fide, coherent concepts unless you can show that it
>> >> really
>> >> does come up with concepts (or objects) that are sensible.
>> >
>> > Perhaps you could exhibit some examples where learning based on temporal
>> > simultaneity with a preference for identifying common patterns (as PCA
>> > requires) fails. Clearly, if I think that a relatively simple approach
>> > (like
>> > PCA on dp/dt inputs) should work, but you are convinced that it will
>> > fall
>> > into an abyss of superstitious learning, then you will have a MUCH
>> > easier
>> > time exhibiting a couple of example failures than I will have somehow
>> > proving that it always works (which is probably beyond the mathematical
>> > state of the art).
>> >
>> > I'm not saying your are wrong here, only that you may not have heard me
>> > (probably my fault for not saying things clearly enough), and you
>> > haven't
>> > made your point by exhibiting something on which my approach would fail.
>> >
>> >> FWIW, I would level the same criticism against quite a few other
>> >> people,
>> >> so you don't stand alone here.
>> >
>> > My ego is quite indestructable and I understand that your body
>> > temperature
>> > is low, so you have nothing to worry about here.
>> >> (Just briefly:  if I move on to look at your actual reply above, I see
>> >> also mention of rates of change (dp/dt), but no explanation of how
>> >> rates of
>> >> change of anything would help a system build a concept that is a
>> >> combination
>> >> (NOT an association, please!) of [Julius Caesar's pinky toe and a sour
>> >> grape
>> >> that Brutus' just spat out].  The rates of change seem irrelevant
>> >> here).
>> >
>> > If you take a neuron or Bayesian formula programmed to do something
>> > static
>> > and throw dp/dt inputs at it, its output will be the dp/dt of the result
>> > from static operation. You could then simply integrate it to produce
>> > exactly
>> > the same output. Hence, the ONLY reason to operate in dp/dt space is for
>> > the
>> > learning, as the transformation itself is unaffected.
>> >
>> > Now, if you look for an association in dp/dt space and decide to
>> > recognize
>> > it, that same neuron with then operate to recognize a combination, once
>> > its
>> > output has been integrated. Of course, not integrating but simply using
>> > its
>> > output by subsequent neurons, the entire system will operate as though
>> > it
>> > recognized the combination, even though, if you attached an oscilloscope
>> > to
>> > the output, you would see positive and negative spikes around what would
>> > be
>> > a steady-state output in "object" mode.
>> >
>> > In short, it programs based on associations, but functions based on
>> > represented combinations, that representation being the dp/dt of the
>> > combination.
>> >
>> >>>
>> >>>
>> >>>    Instead, you waved your hands and said "fast unsupervised learning
>> >>>     > then becomes completely trivial" .... this statement is a
>> >>>    declaration that a good mechanism is available.
>> >>>
>> >>>    You then also talk about "like" objects.  But the whole concept of
>> >>>    "like" is extraordinarily troublesome.  Are Julius Caesar and
>> >>> Brutus
>> >>>    "like" each other?  Seen from our distance, maybe yes, but from the
>> >>>    point of view of Julius C., probably not so much.  Is a G-type star
>> >>>    "like" a mirror?  I don't know any stellar astrophysicists who
>> >>> would
>> >>>    say so, but then again OF COURSE they are, because they are almost
>> >>>    indistinguishable, because if you hold a mirror up in the right way
>> >>>    it can reflect the sun and the two visual images can be identical.
>> >>>
>> >>>    These questions can be resolved, sure enough, but it is the whole
>> >>>    business of resolving these questions (rather than waving a hand
>> >>>    over them and declaring them to be trivial) that is the point.
>> >>>
>> >>>  I think that pretty much everyone everyone who has "dented their
>> >>> pick"
>> >>> on unsupervised learning (this includes myself. Does anyone else here
>> >>> have
>> >>> these same scars?) has developed methods that would work on
>> >>> "completely
>> >>> obvious" test cases but failed miserably on real-world input. My point
>> >>> here
>> >>> is that looking at things from a dp/dt point of view, real-world
>> >>> situations
>> >>> become about as simple as "completely obvious" test cases.
>> >>>  I would quote some good source to make this point, but I don't think
>> >>> anyone has gone here yet.
>> >>
>> >>>
>> >>>
>> >>> If you don't have a clear demonstration that this dp/dt idea does
>> >>> deliver
>> >>> the goods, why are you claiming that it does?  Surely it is one or the
>> >>> other?
>> >>>
>> >>> This month I am wearing my mathematician hat. My son Eddie is the NN
>> >>> hacker of the family, and he is waiting impatiently for me to declare
>> >>> a
>> >>> tentative completion so he can run with it.
>> >>>
>> >>> For now, my goal is to come up with sufficiently good theory that even
>> >>> you can't poke any significant holes in it. Once I become the first
>> >>> person
>> >>> in history to ever receive the the Loosemore Seal of No Objection, I
>> >>> will
>> >>> probably wrap this thing up and turn it over to Eddie.
>> >>>>
>> >>>> But Steve, if YOU claim that "looking at things from a dp/dt point of
>> >>>> view" does in fact yield a dramatic breakthrough that allows
>> >>>> unsupervised
>> >>>> learning to work on real world cases (something nobody else can do
>> >>>> right
>> >>>> now),
>> >>>>
>> >>>> Not entirely true, as PCA does what could be considered to be
>> >>>> unsupervised learning, though granted, it is WAY too inefficient for
>> >>>> NN/AGI
>> >>>> use without dp/dt.
>> >>>>>
>> >>>>> then YOU are expected to be the one who has gone there, done it, and
>> >>>>> come back with evidence that your idea does in fact do that.
>> >>
>> >> First comes the theory, then comes the demo. Neither contains any sort
>> >> of
>> >> proof, but it is a LOT cheaper to shoot something down BEFORE it is
>> >> built
>> >> than after. Hence, I find this exercise VERY valuable. THANKS. Please
>> >> keep
>> >> up the good work.
>> >>
>> >> Steve Richield
>> >
>> > ________________________________
>> > agi | Archives | Modify Your Subscription
>>
>>
>>
>> --
>> Abram Demski
>> Public address: [email protected]
>> Public archive: http://groups.google.com/group/abram-demski
>> Private address: [email protected]
>>
>>
>> -------------------------------------------
>> agi
>> Archives: https://www.listbox.com/member/archive/303/=now
>> RSS Feed: https://www.listbox.com/member/archive/rss/303/
>> Modify Your Subscription: https://www.listbox.com/member/?&;
>> Powered by Listbox: http://www.listbox.com
>
> ________________________________
> agi | Archives | Modify Your Subscription



-- 
Abram Demski
Public address: [email protected]
Public archive: http://groups.google.com/group/abram-demski
Private address: [email protected]


-------------------------------------------
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244&id_secret=123753653-47f84b
Powered by Listbox: http://www.listbox.com

Re: [agi] Introducing Steve's "Theory of Everything" in cognition.

Reply via email to