On 21 Nov 2003 12:44:23 -0800, [EMAIL PROTECTED] (Chih-Mao Hsieh) wrote:

> Dear Edstat-listers,
> 
> I have 8 variables per observation, all count data 
> (integers>0), and I want to be able to run an R factor 
> analysis to obtain factor scores.  The data have the 
> following attributes:
> 
> (1) Hundreds of thousands of observations at my disposal, from which I can sample if 
> nec.
> (2) Significantly non-normal, apparently not very amenable to transformations
> (3) Significant portions of the observations have zeros "across the board"

I want to discuss your (3).  For data that I use (symptoms, etc.),
there is an ordinary, 1st Principal Component where everything is 
positively correlated.  I'm doing factoring on patients, where none
of them are all-zeros.  If I had a sub-sample with zeros across the
board, I think nobody would mind if I dropped them, without much
further justification.

Now, Zeros are a special concern.  It happens, at times, that the
gap between 0  and 1  could be considered as much larger 
than any of the other counts -- number of prior heart attacks,
number of pregnancies, and so on.  It *might*  be sensible
to consider transforming all your 8 variables to 0/1, and 
considering the associations among those.  I'm certainly 
not saying that this would be your only analysis, but I can
imagine data where those crosstabulations could be the
most interesting way to look at the data.


[ snip, rest]

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html
"Taxes are the price we pay for civilization." 
.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
.                  http://jse.stat.ncsu.edu/                    .
=================================================================

Reply via email to