--- Mike Tintner <[EMAIL PROTECTED]> wrote:
> On the one hand, we can perhaps agree that one of the brain's glories is 
> that it can very rapidly draw analogies - that I can quickly produce a 
> string of associations like, say,  "snake", "rope," "chain", "spaghetti 
> strand," - and you may quickly be able to continue that string with further 
> associations, (like "string"). I believe that power is mainly based on 
> "look-up" - literally finding matching shapes at speed. But I don't see the 
> brain as checking through huge numbers of such shapes. (It would be 
> enormously demanding on resources, given that these are complex pictures, 
> no?).

Semantic models learn associations by proximity in the training text.  The
degree to which you associate "snake" and "rope" depends on how often these
words appear near each other.  You can create an association matrix A, e.g.
A[snake][rope] is the degree of association between these words.

Among the most successful of these models is latent semantic analysis (LSA),
where A is factored: A = USV by singular value decomposition (SVD), such that
U and V are orthonormal and S is diagonal, and then discard all but the
largest elements of S.  In a typical LSA model, A is 20K by 20K, and S is
reduced to about 200.  This approximates A to two 20K by 200 matrices, using
about 2% as much space.

One effect of lossy compression by LSA is to derive associations by the
transitive property of semantics.  For example, if "snake" is associated with
"rope" and "rope" with "chain", then the LSA approximation will derive an
association of "snake" with "chain" even if it was not seen in the training
data.

SVD has an efficient parallel implementation.  It is most easily visualized as
a 20K by 200 by 20K 3-layer linear neural network [1].  But this really should
not be surprising, because natural language evolved to be processed
efficiently on a slow but highly parallel computer.

1. Gorrell, Genevieve (2006), “Generalized Hebbian Algorithm for Incremental
Singular Value Decomposition in Natural Language Processing”, Proceedings of
EACL 2006, Trento, Italy.
http://www.aclweb.org/anthology-new/E/E06/E06-1013.pdf


-- Matt Mahoney, [EMAIL PROTECTED]

-----
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?member_id=8660244&id_secret=71675396-27fd0e

Reply via email to