Clustering has a lot of associated problems.  The first is tha tof cluster
validity--most algorithms define the existence of as many clusters as the user
demands.  A very important problem is homogeneity of variance.  So a Z
transformation is not a bad idea whether or not the variables are normal.
Quasi-normnality is about all you have to assume--the absence of intersample
polymodality and the aproximation of the mean and the mode. However, to my
knowledge, there is no satisfying "theory" associated withcluster analyis--only
rules of thumb.

Beng Hai Chea wrote:

> Here is a statistical issue that I have been pondering for a few days now,
> and I am hoping someone can shed some light or even help set me straight.
>
> Would like to know if we need to assume multivariate normality for the data
> whenever we use the Euclidean distance based clustering?
>
> Or it is good to have but not necessary?
>
> The argument I used was that since we need to standardize the raw data for
> this type of clustering, thus we need to assume normality or at least try to
> make sure that the data is normally distributed.
>
> Would like to hear the opinions from this mailing list.
>
> Thanks in advance!
> Beng Hai
>
> _________________________________________________________________________
> Get Your Private, Free E-mail from MSN Hotmail at http://www.hotmail.com.
>
> =================================================================
> Instructions for joining and leaving this list and remarks about
> the problem of INAPPROPRIATE MESSAGES are available at
>                   http://jse.stat.ncsu.edu/
> =================================================================



=================================================================
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
                  http://jse.stat.ncsu.edu/
=================================================================

Reply via email to