On Tue, Nov 29, 2011 at 8:49 PM, Owen Densmore <[email protected]> wrote:

>
> Specifically, if the data set has highly correlated features such as sq.
> ft. of a house, and the number of floors, a dimensionality reduction
> algorithm is very likely to find high correlation with # floors and sq. ft.
> of the house, and merge these two into a single new reduced term.
>
> A difficulty arrises: what do you name the new, reduced features?
>
> We always used to call them reduced dimensions 1, 2, 3, ..., because they
never stuck around long enough to get familiar.

Opening lines of the abstract for a Hadley Wickham <http://had.co.nz/> talk
in Pittsburgh this week:

It's often said that 80 percent of the effort of analysis is spent just
getting the data ready to analyze, the process of data cleaning. Data
cleaning is not only a vital first step, but it is often repeated multiple
times over the course of an analysis as new problems come to light.


If your data set is the only data set for the problem, and it's already
perfect, and if your reduction method is the only one for the problem, and
it's also perfect, or if all data sets and reduction methods give the exact
same reduced dimensions, then you might have time to worry about what to
call the reduced dimensions.   Otherwise your time is better spent figuring
out how to ensure that your data set is what you think it really is,
because with probability 1 it's a horrible caricature of what you think it
is.  And every time you fix something in the data prep all your carefully
chosen names go down the tubes with whatever amazing theories you attached
to them.

It may be that your class problems are perfect data sets for the perfect
reduction methods they ask you to apply to them, that's never happened to
me.

-- rec --
============================================================
FRIAM Applied Complexity Group listserv
Meets Fridays 9a-11:30 at cafe at St. John's College
lectures, archives, unsubscribe, maps at http://www.friam.org

Reply via email to