I'm taking the Stanford Machine Learning class, and it brought up a problem
I've thought about before: When any linear algebraic process reduces the
dimensionality of a data set, you loose "names" or "labels" for the reduced
features.

Specifically, if the data set has highly correlated features such as sq.
ft. of a house, and the number of floors, a dimensionality reduction
algorithm is very likely to find high correlation with # floors and sq. ft.
of the house, and merge these two into a single new reduced term.

A difficulty arrises: what do you name the new, reduced features?

This occurs big time in DTM (Document Term Matrices) which are classified
using SVD to create a much smaller lexicon than the entire dictionary.  But
when this reduction occurs, there is no new term name that can be used for
the linear combination of initial dictionary entries.

One solution is to "undo" the dimensionality reduction, to revert to an
approximation of the initial terms.  If your dimensionality reduction is
REALLY tight, that works OK.

But is there another solution that can create creditable new terms from the
original ones?  For example, would semantic network approaches help?  I
could see the initial feature names forming a semantic web of triples,
which could yield a navigation technique where the original term names were
not lost, yet the relationship between them and the new reduced set became
visible.

In the Stanford ML class, we discuss feature sets of 10,000 terms being
reduced to 100-500 terms using PCA (Principle Component Analysis) and with
"99% variance retained" .. i.e. only 1% squared projection error (not
regression).  It would be fascinating to retain the initial terms in a web
of some sort.  I know some search classifiers that use this type of K-means
clustering, but alas, they do loose the original terms.

   -- Owen
============================================================
FRIAM Applied Complexity Group listserv
Meets Fridays 9a-11:30 at cafe at St. John's College
lectures, archives, unsubscribe, maps at http://www.friam.org

Reply via email to