Re: [FRIAM] Dimensionality reduced term names.

Marcus G. Daniels Tue, 29 Nov 2011 20:11:18 -0800

On 11/29/2011 8:49 PM, Owen Densmore wrote:

Specifically, if the data set has highly correlated features such assq. ft. of a house, and the number of floors, a dimensionalityreduction algorithm is very likely to find high correlation with #floors and sq. ft. of the house, and merge these two into a single newreduced term.


A difficulty arrises: what do you name the new, reduced features?

Reserve a forbidden character (e.g. \001) as a delimiter and append theoriginal strings upon the term reduction, forming a lexicon of thoseunique strings. Then you don't need to remember the index -> stringrelationships of the original encoding. Alternatively, to make a moredense encoding, one could take the integers corresponding to the terms'row or column indices and form a tuple or list of indices and hash onthat to get the new identifier. Could accumulate that stuffrecursively if you want to know the history of the encodings.


Marcus

============================================================
FRIAM Applied Complexity Group listserv
Meets Fridays 9a-11:30 at cafe at St. John's College
lectures, archives, unsubscribe, maps at http://www.friam.org

Re: [FRIAM] Dimensionality reduced term names.

Reply via email to