Re: cluster analysis validation technique

Art Kendall Thu, 04 Sep 2008 05:53:14 -0700

If you have SPSS here are some ways to do this.

the squared Euclidean distance is the sum of the squared distances oneach dimension.If you have 10 z variables try something like this *untested *syntax.which will find the distance of each case from each centroid.create 60 variables for the centroids in a file with 1 "case" with avariable called constant set to 1, and 6 sets of 10cen1z1 to cen1z10 cen2z1 to cen2z10 ...cen6z1 to cen6z10


in your main file
compute constant=1.
match files file=main /table= centroids by constant.

do repeat
vector
  distance= distance1 to distance6
/ z = z1 to z10
/ center1 = cen1z1 to cen1z10
/ center2 = cen2z1 to cen2z10
. . .
/ center6 = cen6z1 to cen6z10.

loop #i =1 to 6
compute distance(#i)=0.
loop #j = 1 to 10.
distance (#i) = distance(#i)  + ((center(#i) - z(#j)**2).
end loop.
end loop.

If you do not have a huge number of cases and have a fairly powerfulmachine a solution with less effort on your part but a lot ofcomputation for the machine might be this.Just add 6 cases to the main each representing a centroid at the top ofthe files and do PROXIMITIES on the large matrix and then delete thecolumns you do not want.

Another way to look at the agreement between two solutions is to do theclusterings with filtered cases saving the memberships.Then do two DISCRIMINANTs, each time treating the other set of cases asunclustered in the classification phase saving the assignments andprobabilities of membership on each pass.Then CROSSTAB the assignments on the DFA with those from the originalclustering.


Art Kendall
Social Research Consultants




Liza Rovniak wrote:

Hi,
I am hoping someone here can help me with a "how to" question onrunning McIntyre and Blashfield's (1980) nearest-centroid evaluationprocedure to validate the stability of my cluster analysis solution. Iam a newbie to cluster analysis, so this is my first time running thisprocedure.
I have a sample of about 900 observations and have randomly split thesample in two (Sample A and Sample B). I conducted hierarchicalcluster analysis and then calculated the centroid vectors for a3-cluster solution on each of these two subsamples (i.e., steps 1through 4 of McIntrye and Blashfield's evaluation technique).
Step 5 of McIntrye and Blashfield's technique is to calculate "thesquared Euclidean distance for each of Sample B's objects from each ofthe centroids of Sample A," and Step 6 is to assign "each object inSample B to the closest centroid vector." At this point, I am not surewhat buttons to press in SPSS to complete the analysis. Onepossibility I tried is to use K-means cluster analysis to achievethese two steps, but K-means uses simple Euclidean distance (notsquared Euclidean distance as recommended by McIntyre and Blashfield)to assign the observations to clusters. Is this okay? (someone told meit was, but I just want to double-check). I would greatly appreciateany guidance on what buttons to press in SPSS/appropriate syntax tocomplete steps 5 and 6 of this analysis.
Thank you.
Liza Rovniak
Liza S. Rovniak, PhD, MPH

Adjunct Assistant Professor

Center for Behavioral Epidemiology & Community Health

Graduate School of Public Health, San Diego State University

San Diego, CA 92123

Phone: 858-505-4770, ext. 152; Fax: 858-505-8614

Email: [EMAIL PROTECTED]
---------------------------------------------- CLASS-L list.Instructions:http://www.classification-society.org/csna/lists.html#class-l


----------------------------------------------
CLASS-L list.
Instructions: http://www.classification-society.org/csna/lists.html#class-l

Re: cluster analysis validation technique

Reply via email to