I would base the within cases vector of variables on the set of possible responses (a set of multiple dichotomies)
To start I would try leaving each time point as a separate case for the cluster analyses. I would then also explore (cluster) each subset of cases and crosstab the results to look for consistency.
Hope this helps.
Art [EMAIL PROTECTED] Social Research Consultants University Park, MD USA (301) 864-5570
Bob Green wrote:
I am interested in the question of whether pooling data from the same individuals into a single variable which would violate the assumption of the independence of observations in multiple regression, is problematic in cluster analysis.
Briefly, I have data collected at baseline and 4 time points asking whether someone smoked and the reasons why. Any individual might give 1-3 responses, which could range from a single word to a sentence. These open-ended responses have been coded by coders. There are therefore 5 time periods x potentially 3 responses.
I have received advice that it is acceptable to pool this data into 1 variable and have run the analysis using the cluster option in a content analysis software program and the results were both interpretable and made sense (the analysis was performed using the default options of a similarity matrix, average linkage and the Jaccard coefficient) . However, my readings and enquiries to date have not been of much assistance in providing substantiative support for this approach. Any advice or references in relation to this question is appreciated,
regards
Bob Green
