Help on HCA and MDS

Ufuk Yildirim Tue, 10 Jun 2003 04:06:19 -0700

Hi everyone,
I have couple of questions on (hierarchical) cluster analysis and Multidimensional 
scaling. As part of my research, I collected data using a method called 'similarity 
rating' on a scale of 1 to 9. There are 30 variables (30 concepts from physics to be 
exact). I want to find out how people organise these concepts. The software I am using 
is SPSS 11, because SPSS is the only one I know how to use and one of the two 
statistical packages available in university computers (I think the other one is SAS). 
I should add that I am not very familiar with the theoretical background of these 
analyses, though trying my best to get as much information as I can/need. For example, 
I have been reading a lot lately on MDS and HCA, but I still do not know what the 
basic assumptions are for MDS and HCA. I need to find a good book which explains 
things conceptually, with little mathematical notation.
Now my real problem, as I enter the data in SPSS, I use the subjects' ratings of the 
pairwise similarities for the 30 concepts. I want to know which of these is the 
appropriate statistical analysis for my analysis. I am confused with the 
metric/non-metric distinction. My data is non-metric I think. Can I use HCA with 
non-metric data? If I can, and if HCA is appropriate, what is the best method? Ward's? 
Between-groups linkage? or within-groups linkage? etc. Since my original data is 
already a proximity matrix (or at least I think it is), what HCA is doing seems to be 
wrong. It tries to create proximity matrix again. Is this ok? When I run the analysis 
as it is, it seem fine, but when I change the syntax so that it uses the original data 
matrix in /MATRIX IN ('filename.sav'), a totally different clustering is produced. 
Which one is correct? Is there a clearly written book on multivariate analysis using 
SPSS?


For MDS, I have similar problem. What are the things I need to do to get a clear 
picture of how people organise these 30 concepts. Because stress value with low 
dimensions is quite law, I have to increase the number of dimensions. By the way in 
SPSS results, there a lot of stress values: normalized raw stress, Stress-I, Stress-II 
and S-Stress. Which of these should I use to interpret my results? Also, what are 
"Dispersion Accounted For (D.A.F.)" and "Tucker's Coefficient of Congruence" used for? 
What is the difference between Simplex and Torgerson in initial configuration options?

I know this is a lot, but as I mentioned earlier there isn't any book on multivariate 
statistics using SPSS as far as I know. Many books on multivariate statistics explain 
things to make life more difficult. If you could help me, I would be very happy.

Thank you very much for your interest and help in advance.

Sincerely,

Ufuk YILDIRIM

Help on HCA and MDS

Reply via email to