On 21 Nov 2003 12:44:23 -0800, [EMAIL PROTECTED] (Chih-Mao Hsieh) wrote: > Dear Edstat-listers, > > I have 8 variables per observation, all count data > (integers>0), and I want to be able to run an R factor > analysis to obtain factor scores. The data have the > following attributes: > > (1) Hundreds of thousands of observations at my disposal, from which I can sample if > nec. > (2) Significantly non-normal, apparently not very amenable to transformations > (3) Significant portions of the observations have zeros "across the board"
I want to discuss your (3). For data that I use (symptoms, etc.), there is an ordinary, 1st Principal Component where everything is positively correlated. I'm doing factoring on patients, where none of them are all-zeros. If I had a sub-sample with zeros across the board, I think nobody would mind if I dropped them, without much further justification. Now, Zeros are a special concern. It happens, at times, that the gap between 0 and 1 could be considered as much larger than any of the other counts -- number of prior heart attacks, number of pregnancies, and so on. It *might* be sensible to consider transforming all your 8 variables to 0/1, and considering the associations among those. I'm certainly not saying that this would be your only analysis, but I can imagine data where those crosstabulations could be the most interesting way to look at the data. [ snip, rest] -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html "Taxes are the price we pay for civilization." . . ================================================================= Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at: . http://jse.stat.ncsu.edu/ . =================================================================
