If you read papers on support vector machines using kernel methods you will
realize that it is well know that you can do certain types of matching and
other operations in high dimensional spaces with out having to actually
normally deal in the high dimensions by use of the "kernel" trick. The
issue is often that of finding a particular kernel that works well for your
problem. Collins shows the kernel trick can be extended to parse tree net
matching.
With regard to my statement that the efficiency of the kernel trick could be
applied relatively generally, it is quite well supported by the following
text from page 4 of the paper.
"This paper and previous work by Lodhi et al. [12] examining the application
of convolution kernels to strings provide some evidence that convolution
kernels may provide an extremely useful tool for applying modern machine
learning techniques to highly structured objects. The key idea here is that
one may take a structured object and split it up into parts. If one can
construct kernels over the parts then one can combine these into a kernel
over the whole object. Clearly, this idea can be extended recursively so
that one only needs to construct kernels over the "atomic" parts of a
structured object. The recursive combination of the kernels over parts of an
object retains information regarding the structure of that object"
MARK WASER===> You also make statements that are explicitly contradicted in
the paper. For example, you say "But there really seem to be no reason why
there should be any limit to the dimensionality of the space in which the
Collin's algorithm works, because it does not use an explicit vector
representation" while his paper quite clearly states "Each tree is
represented by an n dimensional vector where the i'th component counts the
number of occurences of the i'th tree fragment." (A mistake I believe you
made because you didn't understand the prevceding sentence -- or, more
critically, *any* of the math).
ED PORTER=====> The quote you give is from the last paragraph on page two of
the Collin's paper cited in my prior email. The language you quoted
occurred in the following context.
"Conceptually we begin by enumerating all tree fragments that occur in the
training data
1,....,n. NOTE THAT THIS IS DONE ONLY IMPLICITLY. Each tree is represented
by an n dimensional vector where the i'th component counts the number of
occurences of the i'th tree
fragment." (capitalization is added for emphasis)
This is the discussion of the conceptually very high dimensional space his
system effectively computes in but normally avoid having to explicitly deals
in. In that conceptually high dimensional space patterns are represented
conceptually by vectors having a scalar associated with each dimension of
the high dimensional space. But this vector is only the conceptual
representation, not the one his system actually explicitly uses for
computation. This is the very high dimensional space I was talking about,
not the reduced dimensionality I talked about in which most operations are
performed.
The 4th paragraph on Page 3 of the paper starts with "The key to our
efficient use of this high dimensional representation is the definition of
an appropriate kernel." This kernel method it discusses uses a kernel
function C(n1,n2) which is at the end of the major equation what has three
equal signs and spans the width of page 3. Immediately below is an image of
this equations for those reading in rich text
This function C(n1,n2) is summarized in the following text at the start of
the first full paragraph on page 4.
"To see that this recursive definition is correct, note that C(n1,n2) simply
counts the number
of common subtrees that are found rooted at both n1 and n2." In the above
equation, n1 and n2 are iteratively each node, respectively, in each of the
two trees being matched.
Thus this kernel function deals with much less than all of the i subtrees
that occur in the training data mentioned in the above quoted text that
starts with the word "Conceptually." Instead it only deals with that subset
of the i subtrees that occur in the two parse trees that are being compared.
Since the vector referred to in the "conceptually" paragraph that had the
full dimensionality i is not used in the kernel function, it never needs to
be explicitly dealt with. THUS, THE ALLEGATION BELOW THAT I MISUNDERSTOOD
THE MATH BECAUSE THOUGHT COLLIN'S PARSER DIDN'T HAVE TO DEAL WITH A VECTOR
HAVING THE FULL DIMENSIONALITY OF THE SPACE BEING DEALT WITH IS CLEARLY
FALSE.
QED
What this is saying is rather common sensical. It says that regardless of
how many dimensions a space has, you can compare things based on the number
of dimensions they share, which is normally a very small subset of the total
number of dimensions. This is often called a dot product comparision, and
the matching metric is often called a similarity rather than a distance.
This is different than a normal distance comparison, which, by common
definition, measures the similarity or lack thereof in all dimensions. But
in an extremely high dimensional space such computations become extremely
complex, and the distance is dominated by the extremely large number of
dimension that are for many purposes irrelevant to the comparison. Of
course in the case of Collin's paper the comparison is made a little more
complex because in involves a mapping, not just a measure of the similarity
along each shared i dimension.
So, in summary, Mark, before you trash me so harshly, please take a little
more care to be sure your criticisms are actually justified.
Ed Porter
-----Original Message-----
From: Mark Waser [mailto:[EMAIL PROTECTED]
Sent: Wednesday, December 05, 2007 10:27 AM
To: [email protected]
Subject: Re: Hacker intelligence level [WAS Re: [agi]
Funding AGI research]
Interesting. Since I am interested in parsing, I read
Collin's paper. It's a solid piece of work (though with the stated error
percentages, I don't believe that it really proves anything worthwhile at
all) -- but your over-interpretations of it are ridiculous.
You claim that "It is actually showing that you can do
something roughly equivalent to growing neural gas (GNG) in a space with
something approaching 500,000 dimensions, but you can do it without normally
having to deal with more than a few of those dimensions at one time."
Collins makes no claims that even remotely resembles this. He *is* taking a
deconstructionist approach (which Richard and many others would argue
vehemently with) -- but that is virtually the entirety of the overlap
between his paper and your claims. Where do you get all this crap about
500,000 dimensions, for example?
You also make statements that are explicitly contradicted in
the paper. For example, you say "But there really seem to be no reason why
there should be any limit to the dimensionality of the space in which the
Collin's algorithm works, because it does not use an explicit vector
representation" while his paper quite clearly states "Each tree is
represented by an n dimensional vector where the i'th component counts the
number of occurences of the i'th tree fragment." (A mistake I believe you
made because you didn't understand the prevceding sentence -- or, more
critically, *any* of the math).
Are all your claims on this list this far from reality if
one pursues them?
----- Original Message -----
From: "Ed Porter" <[EMAIL PROTECTED]>
To: <[email protected]>
Sent: Tuesday, December 04, 2007 10:52 PM
Subject: RE: Hacker intelligence level [WAS Re: [agi]
Funding AGI research]
The particular NL parser paper in question, Collins's
"Convolution Kernels
for Natural Language"
(http://l2r.cs.uiuc.edu/~danr/Teaching/CS598-05/Papers/Collins-kernels.pdf)
is actually saying something quite important that extends
way beyond parsers
and is highly applicable to AGI in general.
It is actually showing that you can do something roughly
equivalent to
growing neural gas (GNG) in a space with something
approaching 500,000
dimensions, but you can do it without normally having to
deal with more than
a few of those dimensions at one time. GNG is an algorithm
I learned about
from reading Peter Voss that allows one to learn how to
efficiently
represent a distribution in a relatively high dimensional
space in a totally
unsupervised manner. But there really seem to be no reason
why there should
be any limit to the dimensionality of the space in which the
Collin's
algorithm works, because it does not use an explicit vector
representation,
nor, if I recollect correctly, a Euclidian distance metric,
but rather a
similarity metric which is generally much more appropriate
for matching in
very high dimensional spaces.
But what he is growing are not just points representing
where data has
occurred in a high dimensional space, but sets of points
that define
hyperplanes for defining the boundaries between classes. My
recollection is
that this system learns automatically from both labeled data
(instances of
correct parse trees) and randomly generated deviations from
those instances.
His particular algorithm matches tree structures, but with
modification it
would seem to be extendable to matching arbitrary nets.
Other versions of
it could be made to operate, like GNG, in an unsupervised
manner.
If you stop and think about what this is saying and
generalize from it, it
provides an important possible component in an AGI tool kit.
What it shows
is not limited to parsing, but it would seem possibly
applicable to
virtually any hierarchical or networked representation,
including nets of
semantic web RDF triples, and semantic nets, and predicate
logic
expressions. At first glance it appears it would even be
applicable to
kinkier net matching algorithms, such as an Augmented
transition network
(ATN) matching.
So if one reads this paper with a mind to not only what it
specifically
shows, but to what how what it shows could be expanded, this
paper says
something very important. That is, that one can represent,
learn, and
classify things in very high dimensional spaces -- such as
10^1000000000000
dimensional spaces -- and do it efficiently provided the
part of the space
being represented is sufficiently sparsely connected.
I had already assumed this, before reading this paper, but
the paper was
valuable to me because it provided a mathematically rigorous
support for my
prior models, and helped me better understand the
mathematical foundations
of my own prior intuitive thinking.
It means that systems like Novemente can deal in very high
dimensional
spaces relatively efficiently. It does not mean that all
processes that can
be performed in such spaces will be computationally cheap
(for example,
combinatorial searches), but it means that many of them,
such as GNG like
recording of experience, and simple indexed based matching
can scale
relatively well in a sparsely connected world.
That is important, for those with the vision to understand.
Ed Porter
-----Original Message-----
From: Benjamin Goertzel [mailto:[EMAIL PROTECTED]
Sent: Tuesday, December 04, 2007 8:59 PM
To: [email protected]
Subject: Re: Hacker intelligence level [WAS Re: [agi]
Funding AGI research]
> Thus: building a NL parser, no matter how good it is, is
of no use
> whatsoever unless it can be shown to emerge from (or at
least fit with)
> a learning mechanism that allows the system itself to
generate its own
> understanding (or, at least, acquisition) of grammar IN
THE CONTEXT OF A
> MECHANISM THAT ALSO ACCOMPLISHES REAL UNDERSTANDING. When
that larger
> issue is dealt with, a NL parser will arise naturally, and
any previous
> work on non-developmental, hand-built parsers will be
completely
> discarded. You were trumpeting the importance of work that
I know will
> be thrown away later, and in the mean time will be of no
help in
> resolving the important issues.
Richard, you discount the possibility that said NL parser
will play a key
role in the adaptive emergence of a system that can generate
its own
linguistic understanding. I.e., you discount the
possibility that, with the
right learning mechanism and instructional environment,
hand-coded
rules may serve as part of the initial seed for a learning
process that will
eventually generate knowledge obsoleting these initial
hand-coded
rules.
It's fine that you discount this possibility -- I just want
to point out
that
in doing so, you are making a bold and unsupported
theoretical hypothesis,
rather than stating an obvious or demonstrated fact.
Vaguely similarly, the "grammar" of child language is
largely thrown
away in adulthood, yet it was useful as scaffolding in
leading to the
emergence of adult language.
-- Ben G
-----
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?&
-----
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?&
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?&
-----
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?&