Re: [agi] Introducing Steve's "Theory of Everything" in cognition.

Abram Demski Sat, 27 Dec 2008 12:44:29 -0800

Steve,

My thinking in the "significant figures" issue is that the purpose of
unsupervised learning is to find a probabilistic model of the data
(whereas the purpose of supervised learning is to find a probabilistic
model of *one* variable *conditioned on* all the others). When you
talk about the insufficiency of standard PCA, do you think the
problems you refer to relate to


(1) PCA finding a suboptimal model, or
(2) the optimal model being not quite what you are after?

--Abram

On Sat, Dec 27, 2008 at 3:05 AM, Steve Richfield
<[email protected]> wrote:
> Abram,
>
> On 12/26/08, Abram Demski <[email protected]> wrote:
>>
>> Steve,
>>
>> When I made the statement about Fourier I was thinking of JPEG
>> encoding. A little digging found this book, which presents a unified
>> approach to (low-level) computer vision based on the Fourier
>> transform:
>>
>>
>> http://books.google.com/books?id=1wJuTMbNT0MC&dq=fourier+vision&printsec=frontcover&source=bl&ots=3ogSJ2i5uW&sig=ZdvvWvu82q8UX1c5Abq6hWvgZCY&hl=en&sa=X&oi=book_result&resnum=2&ct=result#PPA4,M
>
>
> Interesting, but seems far removed from wet neuronal functionality,
> unsupervised learning, etc.
>>
>> >> But that is beside the present point. :)
>> >
>> >
>> > Probably so. I noticed that you recently graduated, so I thought that I
>> > would drop that thought to make (or unmake) your day.
>>
>> :) I should really update that. It's been a while now.
>>
>> >> generally, any transform that makes the data more sparse, or simpler,
>> >> seems good
>> >
>> >
>> > Certainly if it results in extracting some useful of merit.
>> >
>> >>
>> >> -- which is of course what PCA does,
>> >
>> >
>> > Sometimes yes, and sometimes no. I am looking at incremental PCA
>> > approaches
>> > that reliably extract separate figures of merit rather than
>> > smushed-together
>> > figures of merit as PCA often does.
>>
>> How do you define "figures of merit"? Sounds like an ill-defined
>> problem to me. We don't know which features we *really* want to
>> extract from an image until we know the utility function of the
>> environment, and so know what information will help us achieve our
>> goals.
>
>
> There are several views of this, e.g.
> 1.  Pick something to recognize and see if back-propagation says that it is
> useful.
>      In practice this has problems, because once a downstream neuron makes
> any tentative use,
>      then changing an upstream neuron's functionality scrambles the
> downstream neuron's output.
> 2.  Pick one of the most consistent, easily-recognizable, most
> information-containing things to recognize ala PCA,
>      and expect downstream neurons to combine inputs to extract whatever
> they need through Bayesian logic.
>      This may suffer from too many prospective things to recognize, most of
> which are not needed.
>
> These two methods look like they could fix each other's shortcomings,
> because good initial choices
> of figures of merit should then result in neurons either keeping their
> functionality or abandoning
> it, and thereby avoid the problems of changing functionality scrambling
> downstream neurons.
> This way, back-propagation could be used to select which upstream neurons
> need to find something
> else to do, but would have little/no impact on incremental "learning" as
> reward/punishment systems now do.
>
> My BIG challenge is that without something like dp/dt, unsupervised learning
> doesn't work.
> Now, with dp/dt it is a whole new game, and I have no idea where the
> threshold of real-world functionality lies.
> Hence, I seem to be forced into "pessimization"- making things as good as
> possible,
> even though I may be well past that threshold.
>
> Eddie's NN platform is able to tie into other applications, like flight
> simulator, web cams, etc.
> Hence, there is the whole Internet full of cameras to learn with, and it
> might be interesting to see if
> such a NN would be able to figure out how to fly a plane, maybe like
> Skinner's pidgeons .
>
> Thanks for your continuing thoughts.
>
> Steve Richfield
> ====================
>>
>> On Sat, Dec 27, 2008 at 12:01 AM, Steve Richfield
>> <[email protected]> wrote:
>> > Abram,
>> >
>> > On 12/26/08, Abram Demski <[email protected]> wrote:
>> >>
>> >> Steve,
>> >>
>> >> It is strange to claim that prior PhDs will be worthless when what you
>> >> are suggesting is that we apply the standard methods to a different
>> >> representation.
>> >
>> >
>> > Much of AI and pretty much all of AGI is built on the proposition that
>> > we
>> > humans must code knowledge because the stupid machines can't efficiently
>> > learn it on their own, in short, that UNsupervised learning is
>> > difficult.
>> > Note that in nature, UNsupervised learning handily outperforms
>> > supervised
>> > learning. What good is supervised NN technology when UNsupervised NNs
>> > will
>> > perform MUCH better? What good are a few hand-coded AGI rules and the
>> > engine
>> > that runs them, when an UNsupervised AGI can learn them orders of
>> > magnitude
>> > faster than cities full of programmers? Note my prior post where I
>> > explain
>> > that either AGIs must either abandon UNsuperised learning, or switch to
>> > a
>> > NN-like implementation. In short, easy UNsupervised learning will change
>> > things about as much as the switch from horse and buggy to automobiles,
>> > leaving present PhDs in the position of blacksmiths and historians. Sure
>> > blacksmiths had transferrable skills, but they weren't worth much and
>> > they
>> > weren't respected at all.
>> >
>> > In the 1980s, countless top computer people (including myself) had to
>> > expunge all references to mainframe computers from our resumes in order
>> > to
>> > find work in a microcomputer-dominated field. I expect to see rounds of
>> > the
>> > same sort of insanity when UNsupervised learning emerges.
>> >
>> >>
>> >> But that is beside the present point. :)
>> >
>> >
>> > Probably so. I noticed that you recently graduated, so I thought that I
>> > would drop that thought to make (or unmake) your day.
>> >
>> >>
>> >> Taking the derivative, or just finite differences, is a useful step in
>> >> more ways then one. You are talking about taking differences over
>> >> time, but differences over space can used for edge detection,
>> >> frequently thought of as the first step in visual processing.
>> >
>> >
>> > Correct. My paper goes into using any dimension that is differentiable.
>> > Note
>> > that continuous eye movement converts a physical dimension to time
>> > domain.
>> >
>> >>
>> >> More
>> >> generally, any transform that makes the data more sparse, or simpler,
>> >> seems good
>> >
>> >
>> > Certainly if it results in extracting some useful of merit.
>> >
>> >>
>> >> -- which is of course what PCA does,
>> >
>> >
>> > Sometimes yes, and sometimes no. I am looking at incremental PCA
>> > approaches
>> > that reliably extract separate figures of merit rather than
>> > smushed-together
>> > figures of merit as PCA often does. Another problem with classical PCA
>> > is
>> > that it can't provide real-time learning, but instead, works via a sort
>> > of
>> > "batch processing" of statistics collected in the array that is being
>> > transformed.
>> >
>> >>
>> >> and derivatives in
>> >> time/space, and also the fourier transform I think. The usefulness of
>> >> these transforms springs from underlying regularities in the data.
>> >
>> >
>> > Hmmm, I don't see where a Fourier transform would enter the cognitive
>> > process. Perhaps you see something that I have missed?
>> >
>> >>
>> >> That's not to say that I don't think some representations are
>> >> fundamentally more useful than others-- for example, I know that some
>> >> proofs are astronomically larger in 1st-order logic as compared to
>> >> 2nd-order logic, even in domains where 1st-order logic is
>> >> representationally sufficient.
>> >>
>> >> The statement about time correction reminds me of a system called
>> >> PURR-PUSS.
>> >
>> >
>> >
>> > However, as I understand it, the Purposeful Unprimed Real-world Robot
>> > with
>> > Predictors Using Short Segments still relied on rewards and punishments
>> > for
>> > learning.
>> >>
>> >> It is turing-complete in some sense, essentially by
>> >> compounding time-delays, but I do not know exactly what sense (ie, a
>> >> turing complete *learner* is very different then a turing-complete
>> >> *programmable computer*... PURR PUSS uses something inbetween called
>> >> "soft teaching" if I recall correctly.)
>> >
>> >
>> > The old DEC LINC and LINC-8 computers operated the instruction
>> > sequencing
>> > with a pile of time delay modules, and someone had to go in and
>> > recalibrate
>> > every few months.
>> >
>> > Steve Richfield
>> > ==================
>> >
>> >>
>> >> On Fri, Dec 26, 2008 at 3:26 PM, Steve Richfield
>> >> <[email protected]> wrote:
>> >> > Abram,
>> >> >
>> >> > On 12/26/08, Abram Demski <[email protected]> wrote:
>> >> >>
>> >> >> Steve,
>> >> >>
>> >> >> Richard is right when he says temporal simultaneity is not a
>> >> >> sufficient principle.
>> >> >
>> >> >
>> >> > ... and I fully agree. However, we must unfold this thing one piece
>> >> > at a
>> >> > time.
>> >> >
>> >> > Without the dp/dt "trick", there doesn't seem to be any way to make
>> >> > unsupervised learning work, and I appear to be the first to stumble
>> >> > onto
>> >> > dp/dt. This is a whole new and unexplored world, where the things
>> >> > that
>> >> > stymied past unsupervised efforts fall out effortlessly, but some new
>> >> > challenges present themselves.
>> >> >
>> >> >>
>> >> >> Suppose you present your system with the
>> >> >> following sequences (letters could be substituted for sounds,
>> >> >> colors,
>> >> >> objects, whatever):
>> >> >>
>> >> >> ABCABCABCABC...
>> >> >>
>> >> >> AAABBBAAABBB...
>> >> >>
>> >> >> ABBAAABBBBAAAAABBBBBB...
>> >> >>
>> >> >> ABBCCCDDDDEEEEEFFFFFF...
>> >> >>
>> >> >> ABACABADABACABAEABACABADABACABA...
>> >> >>
>> >> >> All of these sequences have "concepts" behind them. All of these
>> >> >> concepts are immune to temporal-simultaneity-learning (although the
>> >> >> first could be learned by temporal adjacency, and the second by
>> >> >> temporal adjacency with a delay of 3).
>> >> >
>> >> >
>> >> > The way that wet neurons are built, this is unavoidable! Here is
>> >> > another
>> >> > snippet from my paper...
>> >> >
>> >> >
>> >> > Time Correction
>> >> >
>> >> > Electronics designers routinely use differentiation and integration
>> >> > to
>> >> > advance and retard timing. Phase-linear low-pass filters are often
>> >> > used
>> >> > to
>> >> > make short delays in a signal, and "peaking" capacitors were used in
>> >> > RTL
>> >> > (Resistor Transistor Logic) to differentiate inputs for quicker
>> >> > output.
>> >> > Further, wet neurons introduce their own propagation delays from
>> >> > input
>> >> > synapse to output synapse. If not somehow corrected, the net effect
>> >> > of
>> >> > this
>> >> > is a scrambling of the time that a given signal/node/term represents,
>> >> > which
>> >> > if left uncorrected, would result in relating signals together that
>> >> > are
>> >> > arbitrarily shifted in time. There seems to be three schools of
>> >> > thought
>> >> > regarding this:
>> >> >
>> >> > No problem. This simply results in considering various things shifted
>> >> > arbitrarily in time. When wet neurons learn what works, this will
>> >> > result
>> >> > in
>> >> > recognizing time-sequenced phenomena. Arbitrary delays might also do
>> >> > a
>> >> > lot
>> >> > for artificial neurons.
>> >> >
>> >> > Time correction could be instituted, e.g. through Taylor series
>> >> > signal
>> >> > extrapolation to in effect remove a neuron's delay, at the cost of
>> >> > introducing considerable noise into the result. My own simulations of
>> >> > Taylor
>> >> > series extrapolation functions showed that the first derivative may
>> >> > indeed
>> >> > help for small corrections, but beyond that, subtle changes in the
>> >> > shape
>> >> > of
>> >> > a transition cause wild changes in the extrapolated result, sometimes
>> >> > going
>> >> > so far as to produce short bursts of oscillation. Downstream neurons
>> >> > may
>> >> > then amplify these problems to produce havoc at the output of the
>> >> > artificial
>> >> > neural network.
>> >> >
>> >> > The method utilized in CRAY computers might be in use, where all
>> >> > delays
>> >> > were
>> >> > a precise multiple (of their clock rate) long. This was achieved by
>> >> > using
>> >> > interconnecting wires cut to certain specific lengths, even though
>> >> > the
>> >> > length may be much longer than actually physically needed to
>> >> > interconnect
>> >> > two components. Perhaps wet neurons only come in certain very
>> >> > specific
>> >> > delays. There is some laboratory evidence for this, as each section
>> >> > of
>> >> > our
>> >> > brains has neurons with similar geometry within the group. This has
>> >> > been
>> >> > presumed to be an artifact of evolution and limited DNA space, but
>> >> > may
>> >> > in
>> >> > fact be necessary for proper time correction.
>> >> >
>> >> > No one now knows which of these are in use in wet neurons. However,
>> >> > regardless of wet-neuron functionality, artificial neural network
>> >> > researchers should be attentive to time correction
>> >> >
>> >> > Note that #1 above unavoidably solves the time-sequencing puzzle.
>> >> > Introduce
>> >> > some integration, and the sequencing can be arbitrarily shifted in
>> >> > time
>> >> > -
>> >> > within reasonable limits (seconds, maybe a minute or two).
>> >> >
>> >> >>
>> >> >> The transition to sequence learning is (at least, in my eyes) a
>> >> >> transition to relational learning, as opposed to the "flat" learning
>> >> >> that PCA is designed for.
>> >> >
>> >> >
>> >> > I suspect that PCA-like methods are at work within neurons, and that
>> >> > sequence learning and the like fall out from inter-neuronal
>> >> > connections
>> >> > and
>> >> > the associated delays, integration, etc.
>> >> >>
>> >> >> In other words, completely new methods are
>> >> >> required. You already begin that transition by invoking dp/dt, which
>> >> >> assumes a temporal aspect to the data...
>> >> >>
>> >> >> See this blog post for a more full account of my view on the current
>> >> >> state of affairs. (It started out as a post about a new algorithm
>> >> >> I'd
>> >> >> been thinking about, but turned into an essay on the difference
>> >> >> between relational methods and "flat" (propositional) methods, and
>> >> >> how
>> >> >> to bridge the gap. If you're wondering about the title, see the
>> >> >> previous post.)
>> >> >>
>> >> >>
>> >> >>
>> >> >> http://dragonlogic-ai.blogspot.com/2008/12/back-to-ai-ok-here-we-go.html
>> >> >
>> >> >
>> >> > This blog and this email reflect a common problem with AI-thought.
>> >> > There
>> >> > are
>> >> > LOTS of things that people are VERY bad at doing, and these generally
>> >> > make
>> >> > at the same time horrible examples to test human cognition theories
>> >> > on,
>> >> > and
>> >> > wonderful potential AI applications.
>> >> >
>> >> > A perfect example is health and disease, where the human cognition
>> >> > process
>> >> > tends to run in unproductive directions. Any given set of symptoms
>> >> > typically
>> >> > has ~12 different common underlying causal mechanisms, each of which
>> >> > has
>> >> > several cause-and-effect chain links that are typically arranged in a
>> >> > figure
>> >> > "6" configuration with a self-sustaining loop at the end. Given
>> >> > limitless
>> >> > understanding, it typically takes two seemingly unrelated actions to
>> >> > actually cure anything, one to stop the lead-in, and the other to
>> >> > momentarily interrupt the self-sustaining loop.
>> >> >
>> >> > It is my present suspicion that unsupervised learning is SO simple
>> >> > that
>> >> > it
>> >> > just falls out of a system using the right representation. Even the
>> >> > simplest
>> >> > of creatures do quite well at it. However, without that
>> >> > representation,
>> >> > it
>> >> > is horrifically hard/impossible. This means that NN and AGI guys
>> >> > should
>> >> > all
>> >> > STOP whatever they are doing and find the right representation, which
>> >> > is
>> >> > the
>> >> > path that I have gone on.
>> >> >
>> >> > Note that if I am successful, that prior PhDs in AI/NN won't be worth
>> >> > spit
>> >> > because they will be built on false premises. Good for history, but
>> >> > bad
>> >> > for
>> >> > science.
>> >> >
>> >> > Thanks for your thoughts. Any more?
>> >> >
>> >> > Steve Richfield
>> >> > ========================
>> >> >>
>> >> >> On Fri, Dec 26, 2008 at 2:31 AM, Steve Richfield
>> >> >> <[email protected]> wrote:
>> >> >> > Richard,Richard,
>> >> >> >
>> >> >> > On 12/25/08, Richard Loosemore <[email protected]> wrote:
>> >> >> >>
>> >> >> >> Steve Richfield wrote:
>> >> >> >>>
>> >> >> >>>  There are doubtless exceptions to my broad statement, but
>> >> >> >>> generally,
>> >> >> >>> neuron functionality is WIDE open to be pretty much ANYTHING you
>> >> >> >>> choose,
>> >> >> >>> including that of an AGI engine's functionality on its
>> >> >> >>> equations.
>> >> >> >>>  In the reverse, any NN could be expressed in a shorthand form
>> >> >> >>> that
>> >> >> >>> contains structure, synapse functions, etc., and an AGI engine
>> >> >> >>> could
>> >> >> >>> be
>> >> >> >>> built/modified to function according to that shorthand.
>> >> >> >>>  In short, mapping between NN and AGI forms presumes flexibility
>> >> >> >>> in
>> >> >> >>> the
>> >> >> >>> functionality of the target form. Where that flexibility is NOT
>> >> >> >>> present,
>> >> >> >>> e.g. because of orthogonal structure, etc., then you must ask
>> >> >> >>> whether
>> >> >> >>> something is being gained or lost by the difference. Clearly,
>> >> >> >>> any
>> >> >> >>> transition
>> >> >> >>> that involves a loss should be carefully examined to see if the
>> >> >> >>> entire
>> >> >> >>> effort is headed in the wrong direction, which I think was your
>> >> >> >>> original
>> >> >> >>> point here.
>> >> >> >>
>> >> >> >>
>> >> >> >> There is a problem here.
>> >> >> >>
>> >> >> >> When someone says "X and Y can easily be mapped from one form to
>> >> >> >> the
>> >> >> >> other" there is an implication that they are NOt suggesting that
>> >> >> >> we
>> >> >> >> go
>> >> >> >> right
>> >> >> >> down to the basic constituents of both X and Y in order to effect
>> >> >> >> the
>> >> >> >> mapping.
>> >> >> >>
>> >> >> >> Thus:  "Chalk and Cheese can easily be mapped from one to the
>> >> >> >> other"
>> >> >> >> ....
>> >> >> >> trivially true if we are prepared to go down to the common
>> >> >> >> denominator
>> >> >> >> of
>> >> >> >> electrons, protons and neutrons.  But if we stay at a sensible
>> >> >> >> level
>> >> >> >> then,
>> >> >> >> no, these do not map onto one another.
>> >> >> >
>> >> >> >
>> >> >> > The problem here is that you were thinking present existing NN and
>> >> >> > AGI
>> >> >> > systems, neither of which work (yet) in any really useful way,
>> >> >> > that
>> >> >> > it
>> >> >> > was
>> >> >> > obviously impossible to directly convert from one system with its
>> >> >> > set
>> >> >> > of
>> >> >> > bad
>> >> >> > assumptions to another system with a completely different set of
>> >> >> > bad
>> >> >> > assumptions. I completely agree, but I assert that the answer to
>> >> >> > that
>> >> >> > particular question is of no practical interest to anyone.
>> >> >> >
>> >> >> > On the other hand, converting between NN and AGI systems built on
>> >> >> > the
>> >> >> > SAME
>> >> >> > set of assumptions would be simple. This situation doesn't yet
>> >> >> > exist.
>> >> >> > Until
>> >> >> > then, converting a program from one dysfunctional platform to
>> >> >> > another
>> >> >> > is
>> >> >> > uninteresting. When the assumptions get ironed out, then all
>> >> >> > systems
>> >> >> > will be
>> >> >> > built on the same assumptions, and there will be few problems
>> >> >> > going
>> >> >> > between
>> >> >> > them, EXCEPT:
>> >> >> >
>> >> >> > Things need to be arranged in arrays for automated learning, which
>> >> >> > much
>> >> >> > more
>> >> >> > fits the present NN paradigm than the present AGI paradigm.
>> >> >> >>
>> >> >> >> Similarly, if you claim that NN and regular AGI map onto one
>> >> >> >> another, I
>> >> >> >> assume that you are saying something more substantial than that
>> >> >> >> these
>> >> >> >> two
>> >> >> >> can both be broken down into their primitive computational parts,
>> >> >> >> and
>> >> >> >> that
>> >> >> >> when this is done they seem equivalent.
>> >> >> >
>> >> >> >
>> >> >> > Even this breakdown isn't required if both systems are built on
>> >> >> > the
>> >> >> > same
>> >> >> > correct assumptions. HOWEVER, I see no way to transfer fast
>> >> >> > learning
>> >> >> > from an
>> >> >> > NN-like construction to an AGI-like construction. Do you? If there
>> >> >> > is
>> >> >> > no
>> >> >> > answer to this question, then this unanswerable question would
>> >> >> > seem
>> >> >> > to
>> >> >> > redirect AGI efforts to NN-like constructions if they are ever to
>> >> >> > learn
>> >> >> > like
>> >> >> > we do.
>> >> >> >>
>> >> >> >> NN and regular AGI, they way they are understood by people who
>> >> >> >> understand
>> >> >> >> them, have very different styles of constructing intelligent
>> >> >> >> systems.
>> >> >> >
>> >> >> >
>> >> >> > Neither of which work (yet). Of course, we are both trying to fill
>> >> >> > in
>> >> >> > the
>> >> >> > gaps.
>> >> >> >>
>> >> >> >> Sure, you can code both in C, or Lisp, or Cobol, but that is to
>> >> >> >> trash
>> >> >> >> the
>> >> >> >> real meaning of "are easily mapped onto one another".
>> >> >> >
>> >> >> >
>> >> >> > One of my favorite consulting projects involved coding an AI
>> >> >> > program
>> >> >> > to
>> >> >> > solve complex problems that were roughly equivalent to solving
>> >> >> > algebraic
>> >> >> > equations. This composed the Yellow pages for 28 different large
>> >> >> > phone
>> >> >> > directories. The project was for a major phone company and had to
>> >> >> > be
>> >> >> > written
>> >> >> > entirely in COBOL. Further, it had to run at n log n speed and NOT
>> >> >> > n^2
>> >> >> > speed, which I did by using successive sorts instead of list
>> >> >> > processing
>> >> >> > methods. It would have been rather difficult to achieve the needed
>> >> >> > performance in C or Lisp, even though COBOL would seem to be
>> >> >> > everyone's
>> >> >> > first choice as the last choice on the list of prospective
>> >> >> > platforms.
>> >> >> >>>
>> >> >> >>>    ), instead of operating on "objects" (in an
>> >> >> >>>
>> >> >> >>>        object-oriented sense)
>> >> >> >>>
>> >> >> >>>
>> >> >> >>>    Neither NN nor AGI has any intrinsic relationship to OO.
>> >> >> >>>
>> >> >> >>>  Clearly I need a better term here. Both NNs and AGIs tend to
>> >> >> >>> have
>> >> >> >>> neurons or equations that reflect the presence (or absence) of
>> >> >> >>> various
>> >> >> >>> objects, conditions, actions, etc. My fundamental assertion is
>> >> >> >>> that
>> >> >> >>> if
>> >> >> >>> you
>> >> >> >>> differentiate the inputs so that everything in the entire
>> >> >> >>> network
>> >> >> >>> reflects
>> >> >> >>> dp/dt instead of straight probabilities, then the network works
>> >> >> >>> identically,
>> >> >> >>> but learning is GREATLY simplified.
>> >> >> >>
>> >> >> >> Seems like a simple misunderstanding:  you were not aware that
>> >> >> >> "object
>> >> >> >> oriented" does not mean the same as saying that there are
>> >> >> >> fundamental
>> >> >> >> atomic
>> >> >> >> constituents of a representation.
>> >> >> >
>> >> >> >
>> >> >> > A typical semantic overloading problem. "Atomic consitituent
>> >> >> > orientation"
>> >> >> > doesn't really work either, because in later stages, individual
>> >> >> > terms/neurons can represent entire concepts, strategies, etc. I am
>> >> >> > still
>> >> >> > looking for a good term here.
>> >> >> >>>
>> >> >> >>>
>> >> >> >>>    , instead, operates on the rate-of-changes in the
>> >> >> >>>
>> >> >> >>>        probabilities of "objects", or dp/dt. Presuming
>> >> >> >>> sufficient
>> >> >> >>>        bandwidth to generally avoid superstitious coincidences,
>> >> >> >>> fast
>> >> >> >>>        unsupervised learning then becomes completely trivial, as
>> >> >> >>> like
>> >> >> >>>        objects cause simultaneous like-patterned changes in the
>> >> >> >>> inputs
>> >> >> >>>        WITHOUT the overlapping effects of the many other objects
>> >> >> >>>        typically present in the input (with numerous minor
>> >> >> >>> exceptions).
>> >> >> >>>
>> >> >> >>>
>> >> >> >>>    You have already presumed that something supplies the system
>> >> >> >>> with
>> >> >> >>>    "objects" that are meaningful.  Even before your first
>> >> >> >>> mention
>> >> >> >>> of
>> >> >> >>>    dp/dt, there has to be a mechanism that is so good that it
>> >> >> >>> never
>> >> >> >>>    invents objects such as:
>> >> >> >>>
>> >> >> >>>    Object A:  "A person who once watched all of Tuesday Welds
>> >> >> >>> movies
>> >> >> >>> in
>> >> >> >>>    the space of one week" or
>> >> >> >>>
>> >> >> >>>    Object B:  "Something that is a combination of Julius
>> >> >> >>> Caesar's
>> >> >> >>> pinky
>> >> >> >>>    toe and a sour grape that Brutus' just spat out" or
>> >> >> >>>
>> >> >> >>>    Object C:  "All of the molecules involved in a swiming gala
>> >> >> >>> that
>> >> >> >>>    happen to be 17.36 meters from the last drop of water that
>> >> >> >>> splashed
>> >> >> >>>    from the pool".
>> >> >> >>>
>> >> >> >>>    You have supplied no mechanism that is able to do that, but
>> >> >> >>> that
>> >> >> >>>    mechanism is 90% of the trouble, if learning is what you are
>> >> >> >>> about.
>> >> >> >>>
>> >> >> >>>  With prior unsupervised learning you are 100% correct. However
>> >> >> >>> none
>> >> >> >>> of
>> >> >> >>> the examples you gave involved temporal simultaneity. I will
>> >> >> >>> discuss B
>> >> >> >>> above
>> >> >> >>> because it is close enough to be interesting.
>> >> >> >>>  If indeed someone just began to notice something interesting
>> >> >> >>> about
>> >> >> >>> Caesar's pinkie toe *_as_* they just began to notice the taste
>> >> >> >>> of a
>> >> >> >>> sour
>> >> >> >>> grape, then yes, that probably would be leaned via the
>> >> >> >>> mechanisms I
>> >> >> >>> am
>> >> >> >>> talking about. However, if one was "present perfect tense" while
>> >> >> >>> the
>> >> >> >>> other
>> >> >> >>> was just beginning, then it wouldn't with my approach but would
>> >> >> >>> with
>> >> >> >>> prior
>> >> >> >>> unsupervised learning methods. For example, Caesar's pinkie toe
>> >> >> >>> had
>> >> >> >>> been
>> >> >> >>> noticed and examined, then before the condition passed they
>> >> >> >>> tasted
>> >> >> >>> a
>> >> >> >>> sour
>> >> >> >>> grape, then temporal simultaneity of the dp/dt edges wouldn't
>> >> >> >>> exist
>> >> >> >>> to
>> >> >> >>> learn
>> >> >> >>> from. Of course, in both cases, the transforms would work
>> >> >> >>> identically
>> >> >> >>> given
>> >> >> >>> identical prior learning/programming.
>> >> >> >>
>> >> >> >>
>> >> >> >> You have not understood the sense in which I made the point, I
>> >> >> >> fear.
>> >> >> >
>> >> >> >
>> >> >> > I think the reverse is true. Consider...
>> >> >> >>
>> >> >> >> I was describing obviously useless concepts.  Ones where there is
>> >> >> >> no
>> >> >> >> temporal simultaneity.
>> >> >> >
>> >> >> >
>> >> >> > dp/dt is unable to even notice things that lack temporal
>> >> >> > simultaneity,
>> >> >> > so
>> >> >> > the examples you gave, though typical challenges to past
>> >> >> > unsupervised
>> >> >> > learning, and complete non-issues in dp/dt space.
>> >> >> >>
>> >> >> >>   Concepts thrown together out of completely useless components.
>> >> >> >
>> >> >> >
>> >> >> > ... that require SOME force/reason/bug/error/etc to get thrown
>> >> >> > together.
>> >> >> > I
>> >> >> > think we both understand how this was a typical challenge to past
>> >> >> > unsupervised learning efforts. I am asserting that in dp/dt
>> >> >> > systems,
>> >> >> > there
>> >> >> > is NO force/reason/error/etc to ever throw such things together,
>> >> >> > and
>> >> >> > hence,
>> >> >> > no reason for vastly complex matrix transforms to then try to pull
>> >> >> > them
>> >> >> > back
>> >> >> > apart.
>> >> >> >>
>> >> >> >> The question is:  how to build a mechanism that does NOT fall
>> >> >> >> into
>> >> >> >> the
>> >> >> >> trap of creating such nonsense-concepts.  If you just say "assume
>> >> >> >> that
>> >> >> >> we
>> >> >> >> have such a concept builder" you beg a million questions.
>> >> >> >
>> >> >> >
>> >> >> >> Your reply, above, took one of my examples and tried to talk
>> >> >> >> about
>> >> >> >> what
>> >> >> >> could happen if it was not, after all, a nonsense-concept.
>> >> >> >
>> >> >> > Note that GMAIL got sick here, so I'll mark your text with >.
>> >> >> > Also,
>> >> >> > some
>> >> >> > replies are deeply indented, so I have bolded some of them.
>> >> >> >
>> >> >> > I was just "playing neuron" without any mindreading abilities.
>> >> >> >>Alas, that is neither here nor there, because (sure enough)
>> >> >> >> *everyone*
>> >> >> >> agrees that temporal simultaneity is a good basic ground for
>> >> >> >> trying
>> >> >> >> to
>> >> >> >> construct new concepts (it is the Reason Number One for creating
>> >> >> >> a
>> >> >> >> new
>> >> >> >> concept!).  But we also know that just common or garden variety
>> >> >> >> Temporal
>> >> >> >> Simultaneity doesn't get you very far .... that is the easiest of
>> >> >> >> all
>> >> >> >> mechanisms, and we need a hundred more concept-building
>> >> >> >> mechanisms
>> >> >> >> that
>> >> >> >> are
>> >> >> >> better than that before we have a real concept-generating engine.
>> >> >> >
>> >> >> > Now, we can start "picking through" the approaches. I suspect that
>> >> >> > looking
>> >> >> > for the principal components of temporally simultaneous inputs
>> >> >> > goes a
>> >> >> > LONG
>> >> >> > was toward what is sought, but have no proof (yet). Do YOU have
>> >> >> > some
>> >> >> > idea as
>> >> >> > to where the threshold of usefullness is?
>> >> >> >
>> >> >> >>  And (here is where my point comes back into the picture) if
>> >> >> >> anyone
>> >> >> >> stands
>> >> >> >> up and says "Hey everyone!  I have discovered a hundred concept
>> >> >> >> building
>> >> >> >> mechanisms that I think will do the trick!", the first question
>> >> >> >> that
>> >> >> >> the
>> >> >> >> crowd will ask is:  "Do your mechanisms work together to build
>> >> >> >> real,
>> >> >> >> sensible concepts, or do they fill the system with bazillions of
>> >> >> >> really
>> >> >> >> dumb, useless concepts (like my nonsense list above)?"
>> >> >> >
>> >> >> > Clearly, PCA on simultaneous inputs will NOT do that, because they
>> >> >> > must
>> >> >> > show
>> >> >> > common things in order not to end up at the wrong end of the
>> >> >> > Huffman
>> >> >> > code.
>> >> >> >
>> >> >> >> Anyone who says that they know of a way to get unsupervised
>> >> >> >> learning
>> >> >> >> to
>> >> >> >> occur is saying, implicitly, that they have those 100 concept
>> >> >> >> building
>> >> >> >> mechanisms ready to go (or one super mechanism as good as all of
>> >> >> >> them).
>> >> >> >>  Hence my original point:  you cannot simply imply that your
>> >> >> >> system
>> >> >> >> is
>> >> >> >> working with bona-fide, coherent concepts unless you can show
>> >> >> >> that
>> >> >> >> it
>> >> >> >> really
>> >> >> >> does come up with concepts (or objects) that are sensible.
>> >> >> >
>> >> >> > Perhaps you could exhibit some examples where learning based on
>> >> >> > temporal
>> >> >> > simultaneity with a preference for identifying common patterns (as
>> >> >> > PCA
>> >> >> > requires) fails. Clearly, if I think that a relatively simple
>> >> >> > approach
>> >> >> > (like
>> >> >> > PCA on dp/dt inputs) should work, but you are convinced that it
>> >> >> > will
>> >> >> > fall
>> >> >> > into an abyss of superstitious learning, then you will have a MUCH
>> >> >> > easier
>> >> >> > time exhibiting a couple of example failures than I will have
>> >> >> > somehow
>> >> >> > proving that it always works (which is probably beyond the
>> >> >> > mathematical
>> >> >> > state of the art).
>> >> >> >
>> >> >> > I'm not saying your are wrong here, only that you may not have
>> >> >> > heard
>> >> >> > me
>> >> >> > (probably my fault for not saying things clearly enough), and you
>> >> >> > haven't
>> >> >> > made your point by exhibiting something on which my approach would
>> >> >> > fail.
>> >> >> >
>> >> >> >> FWIW, I would level the same criticism against quite a few other
>> >> >> >> people,
>> >> >> >> so you don't stand alone here.
>> >> >> >
>> >> >> > My ego is quite indestructable and I understand that your body
>> >> >> > temperature
>> >> >> > is low, so you have nothing to worry about here.
>> >> >> >> (Just briefly:  if I move on to look at your actual reply above,
>> >> >> >> I
>> >> >> >> see
>> >> >> >> also mention of rates of change (dp/dt), but no explanation of
>> >> >> >> how
>> >> >> >> rates of
>> >> >> >> change of anything would help a system build a concept that is a
>> >> >> >> combination
>> >> >> >> (NOT an association, please!) of [Julius Caesar's pinky toe and a
>> >> >> >> sour
>> >> >> >> grape
>> >> >> >> that Brutus' just spat out].  The rates of change seem irrelevant
>> >> >> >> here).
>> >> >> >
>> >> >> > If you take a neuron or Bayesian formula programmed to do
>> >> >> > something
>> >> >> > static
>> >> >> > and throw dp/dt inputs at it, its output will be the dp/dt of the
>> >> >> > result
>> >> >> > from static operation. You could then simply integrate it to
>> >> >> > produce
>> >> >> > exactly
>> >> >> > the same output. Hence, the ONLY reason to operate in dp/dt space
>> >> >> > is
>> >> >> > for
>> >> >> > the
>> >> >> > learning, as the transformation itself is unaffected.
>> >> >> >
>> >> >> > Now, if you look for an association in dp/dt space and decide to
>> >> >> > recognize
>> >> >> > it, that same neuron with then operate to recognize a combination,
>> >> >> > once
>> >> >> > its
>> >> >> > output has been integrated. Of course, not integrating but simply
>> >> >> > using
>> >> >> > its
>> >> >> > output by subsequent neurons, the entire system will operate as
>> >> >> > though
>> >> >> > it
>> >> >> > recognized the combination, even though, if you attached an
>> >> >> > oscilloscope
>> >> >> > to
>> >> >> > the output, you would see positive and negative spikes around what
>> >> >> > would
>> >> >> > be
>> >> >> > a steady-state output in "object" mode.
>> >> >> >
>> >> >> > In short, it programs based on associations, but functions based
>> >> >> > on
>> >> >> > represented combinations, that representation being the dp/dt of
>> >> >> > the
>> >> >> > combination.
>> >> >> >
>> >> >> >>>
>> >> >> >>>
>> >> >> >>>    Instead, you waved your hands and said "fast unsupervised
>> >> >> >>> learning
>> >> >> >>>     > then becomes completely trivial" .... this statement is a
>> >> >> >>>    declaration that a good mechanism is available.
>> >> >> >>>
>> >> >> >>>    You then also talk about "like" objects.  But the whole
>> >> >> >>> concept
>> >> >> >>> of
>> >> >> >>>    "like" is extraordinarily troublesome.  Are Julius Caesar and
>> >> >> >>> Brutus
>> >> >> >>>    "like" each other?  Seen from our distance, maybe yes, but
>> >> >> >>> from
>> >> >> >>> the
>> >> >> >>>    point of view of Julius C., probably not so much.  Is a
>> >> >> >>> G-type
>> >> >> >>> star
>> >> >> >>>    "like" a mirror?  I don't know any stellar astrophysicists
>> >> >> >>> who
>> >> >> >>> would
>> >> >> >>>    say so, but then again OF COURSE they are, because they are
>> >> >> >>> almost
>> >> >> >>>    indistinguishable, because if you hold a mirror up in the
>> >> >> >>> right
>> >> >> >>> way
>> >> >> >>>    it can reflect the sun and the two visual images can be
>> >> >> >>> identical.
>> >> >> >>>
>> >> >> >>>    These questions can be resolved, sure enough, but it is the
>> >> >> >>> whole
>> >> >> >>>    business of resolving these questions (rather than waving a
>> >> >> >>> hand
>> >> >> >>>    over them and declaring them to be trivial) that is the
>> >> >> >>> point.
>> >> >> >>>
>> >> >> >>>  I think that pretty much everyone everyone who has "dented
>> >> >> >>> their
>> >> >> >>> pick"
>> >> >> >>> on unsupervised learning (this includes myself. Does anyone else
>> >> >> >>> here
>> >> >> >>> have
>> >> >> >>> these same scars?) has developed methods that would work on
>> >> >> >>> "completely
>> >> >> >>> obvious" test cases but failed miserably on real-world input. My
>> >> >> >>> point
>> >> >> >>> here
>> >> >> >>> is that looking at things from a dp/dt point of view, real-world
>> >> >> >>> situations
>> >> >> >>> become about as simple as "completely obvious" test cases.
>> >> >> >>>  I would quote some good source to make this point, but I don't
>> >> >> >>> think
>> >> >> >>> anyone has gone here yet.
>> >> >> >>
>> >> >> >>>
>> >> >> >>>
>> >> >> >>> If you don't have a clear demonstration that this dp/dt idea
>> >> >> >>> does
>> >> >> >>> deliver
>> >> >> >>> the goods, why are you claiming that it does?  Surely it is one
>> >> >> >>> or
>> >> >> >>> the
>> >> >> >>> other?
>> >> >> >>>
>> >> >> >>> This month I am wearing my mathematician hat. My son Eddie is
>> >> >> >>> the
>> >> >> >>> NN
>> >> >> >>> hacker of the family, and he is waiting impatiently for me to
>> >> >> >>> declare
>> >> >> >>> a
>> >> >> >>> tentative completion so he can run with it.
>> >> >> >>>
>> >> >> >>> For now, my goal is to come up with sufficiently good theory
>> >> >> >>> that
>> >> >> >>> even
>> >> >> >>> you can't poke any significant holes in it. Once I become the
>> >> >> >>> first
>> >> >> >>> person
>> >> >> >>> in history to ever receive the the Loosemore Seal of No
>> >> >> >>> Objection,
>> >> >> >>> I
>> >> >> >>> will
>> >> >> >>> probably wrap this thing up and turn it over to Eddie.
>> >> >> >>>>
>> >> >> >>>> But Steve, if YOU claim that "looking at things from a dp/dt
>> >> >> >>>> point
>> >> >> >>>> of
>> >> >> >>>> view" does in fact yield a dramatic breakthrough that allows
>> >> >> >>>> unsupervised
>> >> >> >>>> learning to work on real world cases (something nobody else can
>> >> >> >>>> do
>> >> >> >>>> right
>> >> >> >>>> now),
>> >> >> >>>>
>> >> >> >>>> Not entirely true, as PCA does what could be considered to be
>> >> >> >>>> unsupervised learning, though granted, it is WAY too
>> >> >> >>>> inefficient
>> >> >> >>>> for
>> >> >> >>>> NN/AGI
>> >> >> >>>> use without dp/dt.
>> >> >> >>>>>
>> >> >> >>>>> then YOU are expected to be the one who has gone there, done
>> >> >> >>>>> it,
>> >> >> >>>>> and
>> >> >> >>>>> come back with evidence that your idea does in fact do that.
>> >> >> >>
>> >> >> >> First comes the theory, then comes the demo. Neither contains any
>> >> >> >> sort
>> >> >> >> of
>> >> >> >> proof, but it is a LOT cheaper to shoot something down BEFORE it
>> >> >> >> is
>> >> >> >> built
>> >> >> >> than after. Hence, I find this exercise VERY valuable. THANKS.
>> >> >> >> Please
>> >> >> >> keep
>> >> >> >> up the good work.
>> >> >> >>
>> >> >> >> Steve Richield
>> >> >> >
>> >> >> > ________________________________
>> >> >> > agi | Archives | Modify Your Subscription
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >> Abram Demski
>> >> >> Public address: [email protected]
>> >> >> Public archive: http://groups.google.com/group/abram-demski
>> >> >> Private address: [email protected]
>> >> >>
>> >> >>
>> >> >> -------------------------------------------
>> >> >> agi
>> >> >> Archives: https://www.listbox.com/member/archive/303/=now
>> >> >> RSS Feed: https://www.listbox.com/member/archive/rss/303/
>> >> >> Modify Your Subscription: https://www.listbox.com/member/?&;
>> >> >> Powered by Listbox: http://www.listbox.com
>> >> >
>> >> > ________________________________
>> >> > agi | Archives | Modify Your Subscription
>> >>
>> >>
>> >>
>> >> --
>> >> Abram Demski
>> >> Public address: [email protected]
>> >> Public archive: http://groups.google.com/group/abram-demski
>> >> Private address: [email protected]
>> >>
>> >>
>> >> -------------------------------------------
>> >> agi
>> >> Archives: https://www.listbox.com/member/archive/303/=now
>> >> RSS Feed: https://www.listbox.com/member/archive/rss/303/
>> >> Modify Your Subscription: https://www.listbox.com/member/?&;
>> >> Powered by Listbox: http://www.listbox.com
>> >
>> > ________________________________
>> > agi | Archives | Modify Your Subscription
>>
>>
>>
>> --
>> Abram Demski
>> Public address: [email protected]
>> Public archive: http://groups.google.com/group/abram-demski
>> Private address: [email protected]
>>
>>
>> -------------------------------------------
>> agi
>> Archives: https://www.listbox.com/member/archive/303/=now
>> RSS Feed: https://www.listbox.com/member/archive/rss/303/
>> Modify Your Subscription: https://www.listbox.com/member/?&;
>> Powered by Listbox: http://www.listbox.com
>
> ________________________________
> agi | Archives | Modify Your Subscription



-- 
Abram Demski
Public address: [email protected]
Public archive: http://groups.google.com/group/abram-demski
Private address: [email protected]


-------------------------------------------
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244&id_secret=123753653-47f84b
Powered by Listbox: http://www.listbox.com

Re: [agi] Introducing Steve's "Theory of Everything" in cognition.

Reply via email to