Hi everyone,
I have couple of questions on (hierarchical) cluster analysis and Multidimensional
scaling. As part of my research, I collected data using a method called 'similarity
rating' on a scale of 1 to 9. There are 30 variables (30 concepts from physics to be
exact). I want to find out how people organise these concepts. The software I am using
is SPSS 11, because SPSS is the only one I know how to use and one of the two
statistical packages available in university computers (I think the other one is SAS).
I should add that I am not very familiar with the theoretical background of these
analyses, though trying my best to get as much information as I can/need. For example,
I have been reading a lot lately on MDS and HCA, but I still do not know what the
basic assumptions are for MDS and HCA. I need to find a good book which explains
things conceptually, with little mathematical notation.
Now my real problem, as I enter the data in SPSS, I use the subjects' ratings of the
pairwise similarities for the 30 concepts. I want to know which of these is the
appropriate statistical analysis for my analysis. I am confused with the
metric/non-metric distinction. My data is non-metric I think. Can I use HCA with
non-metric data? If I can, and if HCA is appropriate, what is the best method? Ward's?
Between-groups linkage? or within-groups linkage? etc. Since my original data is
already a proximity matrix (or at least I think it is), what HCA is doing seems to be
wrong. It tries to create proximity matrix again. Is this ok? When I run the analysis
as it is, it seem fine, but when I change the syntax so that it uses the original data
matrix in /MATRIX IN ('filename.sav'), a totally different clustering is produced.
Which one is correct? Is there a clearly written book on multivariate analysis using
SPSS?
For MDS, I have similar problem. What are the things I need to do to get a clear
picture of how people organise these 30 concepts. Because stress value with low
dimensions is quite law, I have to increase the number of dimensions. By the way in
SPSS results, there a lot of stress values: normalized raw stress, Stress-I, Stress-II
and S-Stress. Which of these should I use to interpret my results? Also, what are
"Dispersion Accounted For (D.A.F.)" and "Tucker's Coefficient of Congruence" used for?
What is the difference between Simplex and Torgerson in initial configuration options?
I know this is a lot, but as I mentioned earlier there isn't any book on multivariate
statistics using SPSS as far as I know. Many books on multivariate statistics explain
things to make life more difficult. If you could help me, I would be very happy.
Thank you very much for your interest and help in advance.
Sincerely,
Ufuk YILDIRIM