Re: Help on HCA and MDS

leo horseman Mon, 16 Jun 2003 12:03:12 -0700

No, you do not have a metric.  You have no idea how each subject has
mentally scaled the values between a 1 and a 9.  You cannot construct a
Euclidean distance measure from these values.  Whatever you decide to do,
you may be disappointed in the results.  It is possible for a subject to
rate Concepts 1 and 2 as very similar; Concepts 2 and 3 as very similar; and
Concepts 1 and 3 as not very similar.

If you construct a frequency table for each of your k(k-1)/2=435 possible
pairings, with the values 1-9 as one dimension and the 435 pairs as the
other, with entries the number of responses for each value 1-9 for each
pair, you can quickly visually scan and find strange pairings.

You may wish either to rethink your methodology or to consider some
non-Euclidean form of multidimensional scaling (see Kruskal's work).

You are in the U.K.  You should have easy access to the Clustan clustering
programs; also see online Statsoft textbook.  I am not familiar with SPSS or
SAS implementations.

Be careful in defining your proximities as "similarities" (9=least similar
and 1=most similar) or "dissimilarities" (9=most similar and 1=least
similar).  No, it is not O.K. to allow your computer program to construct
another similarity matrix.  Your subjects have already done so.  Your
proximity matrix should be a kxk concepts matrix, not an NxN subjects matrix
(your are clustering the concepts, not the people; the people are the
"variables" in this instance).

From: Ufuk Yildirim <[EMAIL PROTECTED]>
Reply-To: "Classification, clustering, and phylogeny estimation"
  <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
Subject: Help on HCA and MDS
Date: Tue, 10 Jun 2003 11:57:57 +0100

Hi everyone,
I have couple of questions on (hierarchical) cluster analysis and
Multidimensional scaling. As part of my research, I collected data using a
method called 'similarity rating' on a scale of 1 to 9. There are 30
variables (30 concepts from physics to be exact). I want to find out how
people organise these concepts. The software I am using is SPSS 11, because
SPSS is the only one I know how to use and one of the two statistical
packages available in university computers (I think the other one is SAS).
I should add that I am not very familiar with the theoretical background of
these analyses, though trying my best to get as much information as I
can/need. For example, I have been reading a lot lately on MDS and HCA, but
I still do not know what the basic assumptions are for MDS and HCA. I need
to find a good book which explains things conceptually, with little
mathematical notation.
Now my real problem, as I enter the data in SPSS, I use the subjects'
ratings of the pairwise similarities for the 30 concepts. I want to know
which of these is the appropriate statistical analysis for my analysis. I
am confused with the metric/non-metric distinction. My data is non-metric I
think. Can I use HCA with non-metric data? If I can, and if HCA is
appropriate, what is the best method? Ward's? Between-groups linkage? or
within-groups linkage? etc. Since my original data is already a proximity
matrix (or at least I think it is), what HCA is doing seems to be wrong. It
tries to create proximity matrix again. Is this ok? When I run the analysis
as it is, it seem fine, but when I change the syntax so that it uses the
original data matrix in /MATRIX IN ('filename.sav'), a totally different
clustering is produced. Which one is correct? Is there a clearly written
book on multivariate analysis using SPSS?

For MDS, I have similar problem. What are the things I need to do to get a
clear picture of how people organise these 30 concepts. Because stress
value with low dimensions is quite law, I have to increase the number of
dimensions. By the way in SPSS results, there a lot of stress values:
normalized raw stress, Stress-I, Stress-II and S-Stress. Which of these
should I use to interpret my results? Also, what are "Dispersion Accounted
For (D.A.F.)" and "Tucker's Coefficient of Congruence" used for? What is
the difference between Simplex and Torgerson in initial configuration
options?

I know this is a lot, but as I mentioned earlier there isn't any book on
multivariate statistics using SPSS as far as I know. Many books on
multivariate statistics explain things to make life more difficult. If you
could help me, I would be very happy.

Thank you very much for your interest and help in advance.

Sincerely,

Ufuk YILDIRIM


_________________________________________________________________
Help STOP SPAM with the new MSN 8 and get 2 months FREE*
http://join.msn.com/?page=features/junkmail

Re: Help on HCA and MDS

Reply via email to