Richard,

It actually is more valuable than you say.  

First, the same kernel trick can be used for GNG type unsupervised learning
in high dimensional spaces.  So it is not limited to supervised learning.

Second, you are correct is saying that through the kernel trick it is doing
the actually doing almost all of its computations in a lower dimensional
space.  

But unlike with many kernel tricks, in this one the system actually directly
access each of the dimensions in the space in different combinations as
necessary.  That is important.  It means that you can have a space with as
many dimensions as there are features or patterns in your system and still
efficiently do similarity matching (but not distance matching.)

Ed Porter

-----Original Message-----
From: Richard Loosemore [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, December 05, 2007 2:37 PM
To: [email protected]
Subject: Re: Hacker intelligence level [WAS Re: [agi] Funding AGI research]

Ed Porter wrote:
> Mark,
> 
>> MARK WASER===> You claim that "It is actually showing that you can do
> something roughly equivalent to growing neural gas (GNG) in a space with
> something approaching 500,000 dimensions, but you can do it without
normally
> having to deal with more than a few of those dimensions at one time."
> Collins makes no claims that even remotely resembles this.  He *is* taking
a
> deconstructionist approach (which Richard and many others would argue
> vehemently with) -- but that is virtually the entirety of the overlap
> between his paper and your claims.  Where do you get all this crap about
> 500,000 dimensions, for example?
> 
> ED PORTER=====> The 500K dimensions were mentioned several times in a
> lecture Collins gave at MIT about his parse.  This was probably 5 years
ago
> so I am not 100% sure the number was 500K, but I am about 90% sure that
was
> the number used, and 100% sure the number was well over 100K.  The very
> large size of the number of dimensions was mentioned repeatedly by both
> Collin's and at least one other professor with whom I talked after the
> lecture.  One of the points both emphasized was that by use of the kernel
> trick he was effectively matching in a 500K dimensional space, without
> having to deal with most of those dimensions at any one time (although, it
> is my understanding, that over many parses the system would deal with a
> large percent of all those dimensions.)  

It sounds like you may have misunderstood the relevance of the high 
number of dimensions.

Correct me if I am wrong, but Collins is not really "matching" in large 
numbers of dimensions, he is using the kernel trick to transform a 
nonlinear CLASSIFICATION problem into a high-dimensional linear 
classification.

This is just a trick to enable a better type of supervised learning.

Would you follow me if I said that using supervised learning is of no 
use in general?  Because it means that someone has already (a) decided 
on the dimensions of representation in the initial problem domain, and 
(b) already done all the work of classifying the sentences into 
"syntactically correct" and "syntactically incorrect".  All that the SVM 
is doing is summarizing this training data in a nice compact form:  the 
high number of dimensions involved at one stage of the problem appear to 
be just an artifact of the method, it means nothing in general.

It especially does not mean that this supervised training algorithm is 
somehow able to break out and become and unsupervised, feature-discovery 
method, which it would have to do to be of any general interest.

I still have not read Collins' paper:  I am just getting this from my 
understanding of the math you have mentioned here.

It seems that whether or not he mentioned 500K dimensions or an infinite 
number of dimensions (which he could have done) makes no difference to 
anything.

If you think it does make a big difference, could you explain why?




Richard Loosemore




> If you read papers on support vector machines using kernel methods you
will
> realize that it is well know that you can do certain types of matching and
> other operations in high dimensional spaces with out having to actually
> normally deal in the high dimensions by use of the "kernel" trick.  The
> issue is often that of finding a particular kernel that works well for
your
> problem.  Collins shows the kernel trick can be extended to parse tree net
> matching.  
> 
> With regard to my statement that the efficiency of the kernel trick could
be
> applied relatively generally, it is quite well supported by the following
> text from page 4 of the paper.
> 
> "This paper and previous work by Lodhi et al. [12] examining the
application
> of convolution kernels to strings provide some evidence that convolution
> kernels may provide an extremely useful tool for applying modern machine
> learning techniques to highly structured objects. The key idea here is
that
> one may take a structured object and split it up into parts. If one can
> construct kernels over the parts then one can combine these into a kernel
> over the whole object. Clearly, this idea can be extended recursively so
> that one only needs to construct kernels over the "atomic" parts of a
> structured object. The recursive combination of the kernels over parts of
an
> object retains information regarding the structure of that object"
> 
>> MARK WASER===> You also make statements that are explicitly contradicted
in
> the paper.  For example, you say "But there really seem to be no reason
why
> there should be any limit to the dimensionality of the space in which the
> Collin's algorithm works, because it does not use an explicit vector
> representation" while his paper quite clearly states "Each tree is
> represented by an n dimensional vector where the i'th component counts the
> number of occurences of the i'th tree fragment." (A mistake I believe you
> made because you didn't understand the prevceding sentence -- or, more
> critically, *any* of the math).
> 
> ED PORTER=====> The quote you give is from the last paragraph on page two
of
> the Collin's paper cited in my prior email.  The language you quoted
> occurred in the following context.
> 
> "Conceptually we begin by enumerating all tree fragments that occur in the
> training data
> 1,....,n. NOTE THAT THIS IS DONE ONLY IMPLICITLY. Each tree is represented
> by an n dimensional vector where the i'th component counts the number of
> occurences of the i'th tree
> fragment." (capitalization is added for emphasis)
> 
> This is the discussion of the conceptually very high dimensional space his
> system effectively computes in but normally avoid having to explicitly
deals
> in.  In that conceptually high dimensional space patterns are represented
> conceptually by vectors having a scalar associated with each dimension of
> the high dimensional space.  But this vector is only the conceptual
> representation, not the one his system actually explicitly uses for
> computation.  This is the very high dimensional space I was talking about,
> not the reduced dimensionality I talked about in which most operations are
> performed.
> 
> The 4th paragraph on Page 3 of the paper starts with "The key to our
> efficient use of this high dimensional representation is the definition of
> an appropriate kernel."  This kernel method it discusses uses a kernel
> function C(n1,n2) which is at the end of the major equation what has three
> equal signs and spans the width of page 3.  Immediately below is an image
of
> this equations for those reading in rich text
> 
> 
> 
> This function C(n1,n2) is summarized in the following text at the start of
> the first full paragraph on page 4.
> 
> "To see that this recursive definition is correct, note that C(n1,n2)
simply
> counts the number
> of common subtrees that are found rooted at both n1 and n2."  In the above
> equation, n1 and n2 are iteratively each node, respectively, in each of
the
> two trees being matched. 
> 
> Thus this kernel function deals with much less than all of the i subtrees
> that occur in the training data mentioned in the above quoted text that
> starts with the word "Conceptually."  Instead it only deals with that
subset
> of the i subtrees that occur in the two parse trees that are being
compared.
> Since the vector referred to in the "conceptually" paragraph that had the
> full dimensionality i is not used in the kernel function, it never needs
to
> be explicitly dealt with.  THUS, THE ALLEGATION BELOW THAT I MISUNDERSTOOD
> THE MATH BECAUSE THOUGHT COLLIN'S PARSER DIDN'T HAVE TO DEAL WITH A VECTOR
> HAVING THE FULL DIMENSIONALITY OF THE SPACE BEING DEALT WITH IS CLEARLY
> FALSE. 
> 
> QED
> 
> What this is saying is rather common sensical.  It says that regardless of
> how many dimensions a space has, you can compare things based on the
number
> of dimensions they share, which is normally a very small subset of the
total
> number of dimensions.  This is often called a dot product comparision, and
> the matching metric is often called a similarity rather than a distance.
> This is different than a normal distance comparison, which, by common
> definition, measures the similarity or lack thereof in all dimensions.
But
> in an extremely high dimensional space such computations become extremely
> complex, and the distance is dominated by the extremely large number of
> dimension that are for many purposes irrelevant to the comparison.  Of
> course in the case of Collin's paper the comparison is made a little more
> complex because in involves a mapping, not just a measure of the
similarity
> along each shared i dimension.
> 
> So, in summary, Mark, before you trash me so harshly, please take a little
> more care to be sure your criticisms are actually justified.
> 
> Ed Porter
>   
> 
> 
>               -----Original Message-----
>               From: Mark Waser [mailto:[EMAIL PROTECTED] 
>               Sent: Wednesday, December 05, 2007 10:27 AM
>               To: [email protected]
>               Subject: Re: Hacker intelligence level [WAS Re: [agi]
> Funding AGI research]
> 
>               Interesting.  Since I am interested in parsing, I read
> Collin's paper.  It's a solid piece of work (though with the stated error
> percentages, I don't believe that it really proves anything worthwhile at
> all) -- but your over-interpretations of it are ridiculous.
>                
>               You claim that "It is actually showing that you can do
> something roughly equivalent to growing neural gas (GNG) in a space with
> something approaching 500,000 dimensions, but you can do it without
normally
> having to deal with more than a few of those dimensions at one time."
> Collins makes no claims that even remotely resembles this.  He *is* taking
a
> deconstructionist approach (which Richard and many others would argue
> vehemently with) -- but that is virtually the entirety of the overlap
> between his paper and your claims.  Where do you get all this crap about
> 500,000 dimensions, for example?
>                
>               You also make statements that are explicitly contradicted in
> the paper.  For example, you say "But there really seem to be no reason
why
> there should be any limit to the dimensionality of the space in which the
> Collin's algorithm works, because it does not use an explicit vector
> representation" while his paper quite clearly states "Each tree is
> represented by an n dimensional vector where the i'th component counts the
> number of occurences of the i'th tree fragment." (A mistake I believe you
> made because you didn't understand the prevceding sentence -- or, more
> critically, *any* of the math).
>                
>               Are all your claims on this list this far from reality if
> one pursues them? 
>                
>                
>               ----- Original Message ----- 
>               From: "Ed Porter" <[EMAIL PROTECTED]>
>               To: <[email protected]>
>               Sent: Tuesday, December 04, 2007 10:52 PM
>               Subject: RE: Hacker intelligence level [WAS Re: [agi]
> Funding AGI research]
> 
>               The particular NL parser paper in question, Collins's
> "Convolution Kernels
>               for Natural Language"
>       
>
(http://l2r.cs.uiuc.edu/~danr/Teaching/CS598-05/Papers/Collins-kernels.pdf)
>               is actually saying something quite important that extends
> way beyond parsers
>               and is highly applicable to AGI in general.  
>               
>               It is actually showing that you can do something roughly
> equivalent to
>               growing neural gas (GNG) in a space with something
> approaching 500,000
>               dimensions, but you can do it without normally having to
> deal with more than
>               a few of those dimensions at one time.  GNG is an algorithm
> I learned about
>               from reading Peter Voss that allows one to learn how to
> efficiently
>               represent a distribution in a relatively high dimensional
> space in a totally
>               unsupervised manner.  But there really seem to be no reason
> why there should
>               be any limit to the dimensionality of the space in which the
> Collin's
>               algorithm works, because it does not use an explicit vector
> representation,
>               nor, if I recollect correctly, a Euclidian distance metric,
> but rather a
>               similarity metric which is generally much more appropriate
> for matching in
>               very high dimensional spaces.
>               
>               But what he is growing are not just points representing
> where data has
>               occurred in a high dimensional space, but sets of points
> that define
>               hyperplanes for defining the boundaries between classes.  My
> recollection is
>               that this system learns automatically from both labeled data
> (instances of
>               correct parse trees) and randomly generated deviations from
> those instances.
>               His particular algorithm matches tree structures, but with
> modification it
>               would seem to be extendable to matching arbitrary nets.
> Other versions of
>               it could be made to operate, like GNG, in an unsupervised
> manner.
>               
>               If you stop and think about what this is saying and
> generalize from it, it
>               provides an important possible component in an AGI tool kit.
> What it shows
>               is not limited to parsing, but it would seem possibly
> applicable to
>               virtually any hierarchical or networked representation,
> including nets of
>               semantic web RDF triples, and semantic nets, and predicate
> logic
>               expressions.  At first glance it appears it would even be
> applicable to
>               kinkier net matching algorithms, such as an Augmented
> transition network
>               (ATN) matching.
>               
>               So if one reads this paper with a mind to not only what it
> specifically
>               shows, but to what how what it shows could be expanded, this
> paper says
>               something very important.  That is, that one can represent,
> learn, and
>               classify things in very high dimensional spaces -- such as
> 10^1000000000000
>               dimensional spaces -- and do it efficiently provided the
> part of the space
>               being represented is sufficiently sparsely connected.
>               
>               I had already assumed this, before reading this paper, but
> the paper was
>               valuable to me because it provided a mathematically rigorous
> support for my
>               prior models, and helped me better understand the
> mathematical foundations
>               of my own prior intuitive thinking.  
>               
>               It means that systems like Novemente can deal in very high
> dimensional
>               spaces relatively efficiently. It does not mean that all
> processes that can
>               be performed in such spaces will be computationally cheap
> (for example,
>               combinatorial searches), but it means that many of them,
> such as GNG like
>               recording of experience, and simple indexed based matching
> can scale
>               relatively well in a sparsely connected world.
>               
>               That is important, for those with the vision to understand.
>               
>               Ed Porter
>               
>               -----Original Message-----
>               From: Benjamin Goertzel [mailto:[EMAIL PROTECTED] 
>               Sent: Tuesday, December 04, 2007 8:59 PM
>               To: [email protected]
>               Subject: Re: Hacker intelligence level [WAS Re: [agi]
> Funding AGI research]
>               
>               > Thus: building a NL parser, no matter how good it is, is
> of no use
>               > whatsoever unless it can be shown to emerge from (or at
> least fit with)
>               > a learning mechanism that allows the system itself to
> generate its own
>               > understanding (or, at least, acquisition) of grammar IN
> THE CONTEXT OF A
>               > MECHANISM THAT ALSO ACCOMPLISHES REAL UNDERSTANDING. When
> that larger
>               > issue is dealt with, a NL parser will arise naturally, and
> any previous
>               > work on non-developmental, hand-built parsers will be
> completely
>               > discarded. You were trumpeting the importance of work that
> I know will
>               > be thrown away later, and in the mean time will be of no
> help in
>               > resolving the important issues.
>               
>               Richard, you discount the possibility that said NL parser
> will play a key
>               role in the adaptive emergence of a system that can generate
> its own
>               linguistic understanding.  I.e., you discount the
> possibility that, with the
>               right learning mechanism and instructional environment,
> hand-coded
>               rules may serve as part of the initial seed for a learning
> process that will
>               eventually generate knowledge obsoleting these initial
> hand-coded
>               rules.
>               
>               It's fine that you discount this possibility -- I just want
> to point out
>               that
>               in doing so, you are making a bold and unsupported
> theoretical hypothesis,
>               rather than stating an obvious or demonstrated fact.
>               
>               Vaguely similarly, the "grammar" of child language is
> largely thrown
>               away in adulthood, yet it was useful as scaffolding in
> leading to the
>               emergence of adult language.
>               
>               -- Ben G
>               
>               -----
>               This list is sponsored by AGIRI: http://www.agiri.org/email
>               To unsubscribe or change your options, please go to:
>               http://v2.listbox.com/member/?&;
>               
>               -----
>               This list is sponsored by AGIRI: http://www.agiri.org/email
>               To unsubscribe or change your options, please go to:
>               http://v2.listbox.com/member/?&;
> 
>               This list is sponsored by AGIRI: http://www.agiri.org/email
>               To unsubscribe or change your options, please go to:
>       
> http://v2.listbox.com/member/?&;
> 
> -----
> This list is sponsored by AGIRI: http://www.agiri.org/email
> To unsubscribe or change your options, please go to:
> http://v2.listbox.com/member/?&;

-----
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?&;

-----
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?member_id=8660244&id_secret=72511890-f014f9

<<attachment: winmail.dat>>

Reply via email to