| ï
Whether an approach is appropriate or not depends in part
on what you are trying to do. If all you want to do is to "group the variables"
then there is nothing wrong with computing some measure of similarity among the
variables and then applying some form of cluster analysis. On the other hand, if
you have some model then you have to be more careful.
Factor analysis is a more ambitious approach that
attempts to find underlying factors that can be used to help you interpret the
pattern of covariation shared among the variables. Factor
analysis methods are often used to group variables but that is not the
purpose for which the methods were developed. One uses an oblique
solution if one does not wish to constrain solutions to those in which
the estimated factors are uncorrelated. At least in biology, it is
difficult to justify restrictions to only orthogonal
factors. Discrimination is a different problem.
For the original request of an oblique factor analysis that
penalizes non-zero loadings you might wish to look at: Katz,
J. O. and F. J. Rohlf. 1975. Primary product functionplane, an oblique rotation
to simple structure. Multivariate Behavioral Research, 10:219â232. Software for
it is not generally available but it will be included along with the better
known factor analytic methods in the next version of
NTSYSpc.
Jim
----------------------- F. James Rohlf State University of
New York, Stony Brook, NY 11794-5245 www: http://life.bio.sunysb.edu/ee/rohlf
It is many years since I was current on the factor analysis
literature. I have retired and no longer have access to databases of abstracts
like DIALOG, ORBIT, or PsychInfo. If you have a friend in a
university or a government agency they might be able to do a search for
you.
Since most clustering grew up around grouping cases (rows in the
original data matrix), how is transposing the data matrix and using the same
algorithms problematic in clustering variables (columns)? Just the
opposite, one of the oldest methods of clustering cases was to
standardize then transpose the data matrix and factor it. (this approach
was big in the 1960's & 1970's).
I have a gut feeling (not a
thought out opinion) that an oblique solution means that you end up with
measures that do not have discriminant validity.
SPSS has had many
varieties of factor analysis for many years. It has used 2 kinds of
data, 7 kinds of extraction, and 4 kinds of rotation. (56 different
"methods"!) Maybe some of those combinations would meet your needs. [For
those of us who use methods that other create, it sure would be nice if
someone were to use this framework and produce a document advising on when to
use the options. ]
to get details like algorithms and lit cites go
to http://support.spss.com/ login as
"guest" password
"guest" <statistics> <algorithms> then <catpca>
<catreg> <cluster> <discriminant> <factor>
<overals> <proximities> <quick cluster> <twostep
cluster>
The ANSWERTREE add-on and new TREE procedure in the
base module may also be relevant.
kinds of data: SPSS can work on a
correlation matrix or a covariance matrix. In Psych, the means of
variables are usually arbitrary, so correlations are more common. However,
much of the development of factoring was from psych and ed. Perhaps the
math psych list would have more current people . Society for Mathematical
Psychology: MPSYCH Listserv
quote from SPSS about the extractions
available Available methods are principal components, unweighted least
squares, generalized least squares, maximum likelihood, principal axis
factoring, alpha factoring, and image factoring. end quote. there are
more details in the <help>.
quote from SPSS <help> about
the rotations available. These
-
Varimax Method. An orthogonal rotation method that minimizes the
number of variables that have high loadings on each factor. It simplifies
the interpretation of the factors.
-
Direct Oblimin Method.
A method for oblique (nonorthogonal)
rotation. When delta equals 0 (the default), solutions are most oblique. As
delta becomes more negative, the factors become less oblique. To override
the default delta of 0, enter a number less than or equal to 0.8.
-
Quartimax Method. A rotation method that minimizes the number of
factors needed to explain each variable. It simplifies the interpretation of
the observed variables.
-
Equamax Method. A rotation method that is a combination of the
varimax method, which simplifies the factors, and the quartimax method,
which simplifies the variables. The number of variables that load highly on
a factor and the number of factors needed to explain a variable are
minimized.
-
Promax Rotation. An oblique rotation, which allows factors to be
correlated. It can be calculated more quickly than a direct oblimin
rotation, so it is useful for large datasets.
-
end quote.
Art [EMAIL PROTECTED] Social Research
Consultants University Park, MD USA (301)
864-5570
Wolfgang M. Hartmann wrote:
Thank you for the nice response,
I kmow that in practice transposing the
matrix is a common, but do not think
of it as a very valid approach. (Higher order) Factor analysis with oblique rotation
and restrictions
penalizing nonzero loadings would sound good for me. Would
you know of any references for such an
approach?
Wolfgang
In SPSS all of the few dozen Proximity (similarity measures) can be
applied to variables. (After the data are transformed and
transposed) The Proximity matrix can then be read into the
variety of cluster procedures. Or the transposed data can be read
directly into the CLUSTER, or Quick cluster procedure. I see no
reason (given that you want to cluster variables) that the TWOSTEP cluster
could not read a transposed data matrix.
Of course there are all of the varieties of factor analysis which
are more commonly used to group variables. The CATPCA procedure
factors categorical variables.
When the variables are used to
classify or differentiate a categorical variable, there are procedures
like DISCRIMINANT or the various
|