Re: Hacker intelligence level [WAS Re: [agi] Funding AGI research]

Richard Loosemore Wed, 05 Dec 2007 11:38:56 -0800

Ed Porter wrote:

Mark,

MARK WASER===> You claim that "It is actually showing that you can do

something roughly equivalent to growing neural gas (GNG) in a space with
something approaching 500,000 dimensions, but you can do it without normally
having to deal with more than a few of those dimensions at one time."
Collins makes no claims that even remotely resembles this.  He *is* taking a
deconstructionist approach (which Richard and many others would argue
vehemently with) -- but that is virtually the entirety of the overlap
between his paper and your claims.  Where do you get all this crap about
500,000 dimensions, for example?

ED PORTER=====> The 500K dimensions were mentioned several times in a
lecture Collins gave at MIT about his parse.  This was probably 5 years ago
so I am not 100% sure the number was 500K, but I am about 90% sure that was
the number used, and 100% sure the number was well over 100K.  The very
large size of the number of dimensions was mentioned repeatedly by both
Collin's and at least one other professor with whom I talked after the
lecture.  One of the points both emphasized was that by use of the kernel
trick he was effectively matching in a 500K dimensional space, without
having to deal with most of those dimensions at any one time (although, it
is my understanding, that over many parses the system would deal with a

large percent of all those dimensions.)

It sounds like you may have misunderstood the relevance of the highnumber of dimensions.

Correct me if I am wrong, but Collins is not really "matching" in largenumbers of dimensions, he is using the kernel trick to transform anonlinear CLASSIFICATION problem into a high-dimensional linearclassification.


This is just a trick to enable a better type of supervised learning.

Would you follow me if I said that using supervised learning is of nouse in general? Because it means that someone has already (a) decidedon the dimensions of representation in the initial problem domain, and(b) already done all the work of classifying the sentences into"syntactically correct" and "syntactically incorrect". All that the SVMis doing is summarizing this training data in a nice compact form: thehigh number of dimensions involved at one stage of the problem appear tobe just an artifact of the method, it means nothing in general.

It especially does not mean that this supervised training algorithm issomehow able to break out and become and unsupervised, feature-discoverymethod, which it would have to do to be of any general interest.

I still have not read Collins' paper: I am just getting this from myunderstanding of the math you have mentioned here.

It seems that whether or not he mentioned 500K dimensions or an infinitenumber of dimensions (which he could have done) makes no difference toanything.


If you think it does make a big difference, could you explain why?




Richard Loosemore

If you read papers on support vector machines using kernel methods you will
realize that it is well know that you can do certain types of matching and
other operations in high dimensional spaces with out having to actually
normally deal in the high dimensions by use of the "kernel" trick.  The
issue is often that of finding a particular kernel that works well for your
problem.  Collins shows the kernel trick can be extended to parse tree net

matching.

With regard to my statement that the efficiency of the kernel trick could be
applied relatively generally, it is quite well supported by the following
text from page 4 of the paper.

"This paper and previous work by Lodhi et al. [12] examining the application
of convolution kernels to strings provide some evidence that convolution
kernels may provide an extremely useful tool for applying modern machine
learning techniques to highly structured objects. The key idea here is that
one may take a structured object and split it up into parts. If one can
construct kernels over the parts then one can combine these into a kernel
over the whole object. Clearly, this idea can be extended recursively so
that one only needs to construct kernels over the "atomic" parts of a
structured object. The recursive combination of the kernels over parts of an
object retains information regarding the structure of that object"

MARK WASER===> You also make statements that are explicitly contradicted in

the paper.  For example, you say "But there really seem to be no reason why
there should be any limit to the dimensionality of the space in which the
Collin's algorithm works, because it does not use an explicit vector
representation" while his paper quite clearly states "Each tree is
represented by an n dimensional vector where the i'th component counts the
number of occurences of the i'th tree fragment." (A mistake I believe you
made because you didn't understand the prevceding sentence -- or, more
critically, *any* of the math).

ED PORTER=====> The quote you give is from the last paragraph on page two of
the Collin's paper cited in my prior email.  The language you quoted
occurred in the following context.

"Conceptually we begin by enumerating all tree fragments that occur in the
training data
1,....,n. NOTE THAT THIS IS DONE ONLY IMPLICITLY. Each tree is represented
by an n dimensional vector where the i'th component counts the number of
occurences of the i'th tree
fragment." (capitalization is added for emphasis)

This is the discussion of the conceptually very high dimensional space his
system effectively computes in but normally avoid having to explicitly deals
in.  In that conceptually high dimensional space patterns are represented
conceptually by vectors having a scalar associated with each dimension of
the high dimensional space.  But this vector is only the conceptual
representation, not the one his system actually explicitly uses for
computation.  This is the very high dimensional space I was talking about,
not the reduced dimensionality I talked about in which most operations are
performed.

The 4th paragraph on Page 3 of the paper starts with "The key to our
efficient use of this high dimensional representation is the definition of
an appropriate kernel."  This kernel method it discusses uses a kernel
function C(n1,n2) which is at the end of the major equation what has three
equal signs and spans the width of page 3.  Immediately below is an image of
this equations for those reading in rich text



This function C(n1,n2) is summarized in the following text at the start of
the first full paragraph on page 4.

"To see that this recursive definition is correct, note that C(n1,n2) simply
counts the number
of common subtrees that are found rooted at both n1 and n2."  In the above
equation, n1 and n2 are iteratively each node, respectively, in each of the

two trees being matched.

Thus this kernel function deals with much less than all of the i subtrees
that occur in the training data mentioned in the above quoted text that
starts with the word "Conceptually."  Instead it only deals with that subset
of the i subtrees that occur in the two parse trees that are being compared.
Since the vector referred to in the "conceptually" paragraph that had the
full dimensionality i is not used in the kernel function, it never needs to
be explicitly dealt with.  THUS, THE ALLEGATION BELOW THAT I MISUNDERSTOOD
THE MATH BECAUSE THOUGHT COLLIN'S PARSER DIDN'T HAVE TO DEAL WITH A VECTOR
HAVING THE FULL DIMENSIONALITY OF THE SPACE BEING DEALT WITH IS CLEARLY

FALSE.

QED

What this is saying is rather common sensical.  It says that regardless of
how many dimensions a space has, you can compare things based on the number
of dimensions they share, which is normally a very small subset of the total
number of dimensions.  This is often called a dot product comparision, and
the matching metric is often called a similarity rather than a distance.
This is different than a normal distance comparison, which, by common
definition, measures the similarity or lack thereof in all dimensions.   But
in an extremely high dimensional space such computations become extremely
complex, and the distance is dominated by the extremely large number of
dimension that are for many purposes irrelevant to the comparison.  Of
course in the case of Collin's paper the comparison is made a little more
complex because in involves a mapping, not just a measure of the similarity
along each shared i dimension.

So, in summary, Mark, before you trash me so harshly, please take a little
more care to be sure your criticisms are actually justified.

Ed Porter


                -----Original Message-----

From: Mark Waser [mailto:[EMAIL PROTECTED]Sent: Wednesday, December 05, 2007 10:27 AM

                To: [email protected]
                Subject: Re: Hacker intelligence level [WAS Re: [agi]
Funding AGI research]

                Interesting.  Since I am interested in parsing, I read
Collin's paper.  It's a solid piece of work (though with the stated error
percentages, I don't believe that it really proves anything worthwhile at
all) -- but your over-interpretations of it are ridiculous.

You claim that "It is actually showing that you can do

something roughly equivalent to growing neural gas (GNG) in a space with
something approaching 500,000 dimensions, but you can do it without normally
having to deal with more than a few of those dimensions at one time."
Collins makes no claims that even remotely resembles this.  He *is* taking a
deconstructionist approach (which Richard and many others would argue
vehemently with) -- but that is virtually the entirety of the overlap
between his paper and your claims.  Where do you get all this crap about
500,000 dimensions, for example?

You also make statements that are explicitly contradicted in

the paper.  For example, you say "But there really seem to be no reason why
there should be any limit to the dimensionality of the space in which the
Collin's algorithm works, because it does not use an explicit vector
representation" while his paper quite clearly states "Each tree is
represented by an n dimensional vector where the i'th component counts the
number of occurences of the i'th tree fragment." (A mistake I believe you
made because you didn't understand the prevceding sentence -- or, more
critically, *any* of the math).

Are all your claims on this list this far from reality ifone pursues them?----- Original Message -----From: "Ed Porter" <[EMAIL PROTECTED]>

                To: <[email protected]>
                Sent: Tuesday, December 04, 2007 10:52 PM
                Subject: RE: Hacker intelligence level [WAS Re: [agi]
Funding AGI research]

                The particular NL parser paper in question, Collins's
"Convolution Kernels
                for Natural Language"
        
(http://l2r.cs.uiuc.edu/~danr/Teaching/CS598-05/Papers/Collins-kernels.pdf)
                is actually saying something quite important that extends
way beyond parsers

and is highly applicable to AGI in general.

                It is actually showing that you can do something roughly
equivalent to
                growing neural gas (GNG) in a space with something
approaching 500,000
                dimensions, but you can do it without normally having to
deal with more than
                a few of those dimensions at one time.  GNG is an algorithm
I learned about
                from reading Peter Voss that allows one to learn how to
efficiently
                represent a distribution in a relatively high dimensional
space in a totally
                unsupervised manner.  But there really seem to be no reason
why there should
                be any limit to the dimensionality of the space in which the
Collin's
                algorithm works, because it does not use an explicit vector
representation,
                nor, if I recollect correctly, a Euclidian distance metric,
but rather a
                similarity metric which is generally much more appropriate
for matching in
                very high dimensional spaces.
                
                But what he is growing are not just points representing
where data has
                occurred in a high dimensional space, but sets of points
that define
                hyperplanes for defining the boundaries between classes.  My
recollection is
                that this system learns automatically from both labeled data
(instances of
                correct parse trees) and randomly generated deviations from
those instances.
                His particular algorithm matches tree structures, but with
modification it
                would seem to be extendable to matching arbitrary nets.
Other versions of
                it could be made to operate, like GNG, in an unsupervised
manner.
                
                If you stop and think about what this is saying and
generalize from it, it
                provides an important possible component in an AGI tool kit.
What it shows
                is not limited to parsing, but it would seem possibly
applicable to
                virtually any hierarchical or networked representation,
including nets of
                semantic web RDF triples, and semantic nets, and predicate
logic
                expressions.  At first glance it appears it would even be
applicable to
                kinkier net matching algorithms, such as an Augmented
transition network
                (ATN) matching.
                
                So if one reads this paper with a mind to not only what it
specifically
                shows, but to what how what it shows could be expanded, this
paper says
                something very important.  That is, that one can represent,
learn, and
                classify things in very high dimensional spaces -- such as
10^1000000000000
                dimensional spaces -- and do it efficiently provided the
part of the space
                being represented is sufficiently sparsely connected.
                
                I had already assumed this, before reading this paper, but
the paper was
                valuable to me because it provided a mathematically rigorous
support for my
                prior models, and helped me better understand the
mathematical foundations

of my own prior intuitive thinking.

                It means that systems like Novemente can deal in very high
dimensional
                spaces relatively efficiently. It does not mean that all
processes that can
                be performed in such spaces will be computationally cheap
(for example,
                combinatorial searches), but it means that many of them,
such as GNG like
                recording of experience, and simple indexed based matching
can scale
                relatively well in a sparsely connected world.
                
                That is important, for those with the vision to understand.
                
                Ed Porter
                
                -----Original Message-----

From: Benjamin Goertzel [mailto:[EMAIL PROTECTED]Sent: Tuesday, December 04, 2007 8:59 PM

                To: [email protected]
                Subject: Re: Hacker intelligence level [WAS Re: [agi]
Funding AGI research]
                
                > Thus: building a NL parser, no matter how good it is, is
of no use
                > whatsoever unless it can be shown to emerge from (or at
least fit with)
                > a learning mechanism that allows the system itself to
generate its own
                > understanding (or, at least, acquisition) of grammar IN
THE CONTEXT OF A
                > MECHANISM THAT ALSO ACCOMPLISHES REAL UNDERSTANDING. When
that larger
                > issue is dealt with, a NL parser will arise naturally, and
any previous
                > work on non-developmental, hand-built parsers will be
completely
                > discarded. You were trumpeting the importance of work that
I know will
                > be thrown away later, and in the mean time will be of no
help in
                > resolving the important issues.
                
                Richard, you discount the possibility that said NL parser
will play a key
                role in the adaptive emergence of a system that can generate
its own
                linguistic understanding.  I.e., you discount the
possibility that, with the
                right learning mechanism and instructional environment,
hand-coded
                rules may serve as part of the initial seed for a learning
process that will
                eventually generate knowledge obsoleting these initial
hand-coded
                rules.
                
                It's fine that you discount this possibility -- I just want
to point out
                that
                in doing so, you are making a bold and unsupported
theoretical hypothesis,
                rather than stating an obvious or demonstrated fact.
                
                Vaguely similarly, the "grammar" of child language is
largely thrown
                away in adulthood, yet it was useful as scaffolding in
leading to the
                emergence of adult language.
                
                -- Ben G
                
                -----
                This list is sponsored by AGIRI: http://www.agiri.org/email
                To unsubscribe or change your options, please go to:
                http://v2.listbox.com/member/?&;
                
                -----
                This list is sponsored by AGIRI: http://www.agiri.org/email
                To unsubscribe or change your options, please go to:
                http://v2.listbox.com/member/?&;

                This list is sponsored by AGIRI: http://www.agiri.org/email
                To unsubscribe or change your options, please go to:
        
http://v2.listbox.com/member/?&;

-----
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?&;


-----
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?member_id=8660244&id_secret=72403922-682685

Re: Hacker intelligence level [WAS Re: [agi] Funding AGI research]

Reply via email to