I think that both Rich and I would have looked at trends and other condensations from the start; we wouldn't have needed a computer to tell us what to do
On Mon, 18 Nov 2002, Rich Ulrich wrote: > On 17 Nov 2002 20:00:23 -0500, Elliot Cramer <[EMAIL PROTECTED]> wrote: > > > In sci.stat.edu Radford Neal <[EMAIL PROTECTED]> wrote: > > > > > > : You don't know what you are talking about. There are many, many > > : situations in which data is analysed when there are more variables > > : than observations. > > > > but if you know anything about statistics, you don't analyze them as > > variables but condense them based on your knowledge to many fewer > > variables than observations > > > > > > : The absurdity of saying you can't do anything with more variables than > > : observations is well illustrated by the case of spectroscopic data, > > : where the number of variables is just the number of frequencies (or > > : that you have to throw away the extra data from the better instrument > > : before analysing it. > > see above > > > > : PCA isn't necessarily the best way of analysing such data, but it > > : isn't senseless. > > > > It's senseless > > When I saw a PCA on power-spectral data, the first components > were - neatly - the overall power, the frequency (linear trend), > the quadratic, and so on. The result wasn't senseless. > Maybe it was best to look at it as confirmation, or as a source > of coefficients. In fact, I still wonder how much use it would > have been, if the "sense" had not been obvious. > > For the same data, (I'm not sure, but) I think would be > a mistake to use *all* the components if you are comparing > to new data. The fit that was achieved was necessarily, > arbitrarily perfect. > > On the other hand, for the data from genetic micro-arrays, > and other bio-assays, I have been assuming that PCA > would give little help. I guess, when I wonder some more, > I can accept the possibility, if the samples are big enough. > But I think they are stuck with a lot of separate assays. > > Also, p-levels of statistical tests are misleading when the > observed proportions have a huge range: The experiment > has practically no test-power for a gene that is seldom seen. > I have figured that they do a lot of tabulation of "perfect-but-rare > -prediction" in order to get candidates. Eventually, with > tons of data in hand, they will have to do a heck-of-a-lot > of Bonferroni correction. > > -- > Rich Ulrich, [EMAIL PROTECTED] > http://www.pitt.edu/~wpilib/index.html > . . ================================================================= Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at: . http://jse.stat.ncsu.edu/ . =================================================================
