I'm taking the Stanford Machine Learning class, and it brought up a problem I've thought about before: When any linear algebraic process reduces the dimensionality of a data set, you loose "names" or "labels" for the reduced features.
Specifically, if the data set has highly correlated features such as sq. ft. of a house, and the number of floors, a dimensionality reduction algorithm is very likely to find high correlation with # floors and sq. ft. of the house, and merge these two into a single new reduced term. A difficulty arrises: what do you name the new, reduced features? This occurs big time in DTM (Document Term Matrices) which are classified using SVD to create a much smaller lexicon than the entire dictionary. But when this reduction occurs, there is no new term name that can be used for the linear combination of initial dictionary entries. One solution is to "undo" the dimensionality reduction, to revert to an approximation of the initial terms. If your dimensionality reduction is REALLY tight, that works OK. But is there another solution that can create creditable new terms from the original ones? For example, would semantic network approaches help? I could see the initial feature names forming a semantic web of triples, which could yield a navigation technique where the original term names were not lost, yet the relationship between them and the new reduced set became visible. In the Stanford ML class, we discuss feature sets of 10,000 terms being reduced to 100-500 terms using PCA (Principle Component Analysis) and with "99% variance retained" .. i.e. only 1% squared projection error (not regression). It would be fascinating to retain the initial terms in a web of some sort. I know some search classifiers that use this type of K-means clustering, but alas, they do loose the original terms. -- Owen
============================================================ FRIAM Applied Complexity Group listserv Meets Fridays 9a-11:30 at cafe at St. John's College lectures, archives, unsubscribe, maps at http://www.friam.org
