Graham Jones <[EMAIL PROTECTED]> ����:
: I think there are two situations where clustering is sensible.
: 1. You have some data sets which have been clustered (perhaps by a
: person) and which are considered correctly clustered. You can then look
: for an automatic way of achieving similar results on these data sets by
: trying different algorithms, then cross your fingers and hope the chosen
: algorithm does well on new but 'similar' data sets.
: 2. You have some end-goal, some reason for clustering, which you can
: express in terms of a function (of a clustering) which is to minimised.
: For example you might want to compress some data by replacing points by
: their nearest cluster centres, minimising the size of the data and some
: measure of how badly the centres approximate the points. Or you might
: aim to improve the accuracy of a classifier by clustering within
: individual classes.
: I am not sure if I am agreeing or disagreeing with your opinion. I think
: you (and probably the OP) are talking about using clustering as some
: kind of data exploration or visualisation tool. In that context, I agree
: with you.
: I would be interested to know if anyone thinks there is a good reason to
: use a clustering algorithm besides the two above.
More examples why clustering is useful.... although you
can put these into category 2 as well...
1. Image segmentation can be done by clustering of image pixels. Of
course,
to achieve good segmentation result, it is important to find a good
way to determine the similarity of the pixels, which should consider
color information, texture information as well as spatial locations
of the pixels.
2. Building a "superior" classifiers in the presence of sub-classes.
For example, there are two major ways to write the digit one.
It is easier to build a classifier for each type of "one", then to
build a classifier that can recognizes both types of "one".
3. One can perform regression by first clustering the data, and then
operate on the cluster labels. This may be viewed as one type of
information compression.
4. In information retrieval, one can also cluster the bag-of-word
representation to discover different categories of text documents.
5. (This is not strictly clustering, but ....)
One can estimate a mixture of Gaussians to represent the class
conditional densities in a supervised classification problem,
instead of assuming a Gaussian class conditional density.
.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
. http://jse.stat.ncsu.edu/ .
=================================================================