Re: Statistics Tool For Classification/Clustering

Mark Harrison Wed, 27 Feb 2002 21:13:54 -0800

Good places to start:

Optimal feature extractors, that's better than PCA because you whiten your
inter class scatter and so put all inter class comparisons on the same
level. The good thing is this will also reduce your feature vector
dimensionality to c-1 (where c is # classes). PCA will not do this.


Check the stats of each class, is it Gaussian or known pdf? Apply
parameteric classifier if so.

However you are lucky if you get good classification after this, so you will
probably need non linear, non parametric classifiers. Try K nearest
neighobour, but that might take the age of the Universe so use a condensing
algorithm first to get a smaller representative set.

Matlab is what I use for coding, there are a lot of free toolboxes around.
Mostly I write my own though.

Best wishes

Andrew


"Rishabh Gupta" <[EMAIL PROTECTED]> wrote in message
news:a4eje9$ip8$[EMAIL PROTECTED].;
> Hi All,
>     I'm a research student at the Department Of Electronics, University Of
> York, UK. I'm working a project related to music analysis and
> classification. I am at the stage where I perform some analysis on music
> files (currently only in MIDI format) and extract about 500 variables that
> are related to music properties like pitch, rhythm, polyphony and volume.
I
> am performing basic analysis like mean and standard deviation but then I
> also perform more elaborate analysis like measuring complexity of melody
and
> rhythm.
>
> The aim is that the variables obtained can be used to perform a number of
> different operations.
>     - The variables can be used to classify / categorise each piece of
> music, on its own, in terms of some meta classifier (e.g. rock, pop,
> classical).
>     - The variables can be used to perform comparison between two files. A
> variable from one music file can be compared to the equivalent variable in
> the other music file. By comparing all the variables in one file with the
> equivalent variable in the other file, an overall similarity measurement
can
> be obtained.
>
> The next stage is to test the ability of the of the variables obtained to
> perform the classification / comparison. I need to identify variables that
> are redundant (redundant in the sense of 'they do not provide any
> information' and 'they provide the same information as the other
variable')
> so that they can be removed and I need to identify variables that are
> distinguishing (provide the most amount of information).
>
> My Basic Questions Are:
>     - What are the best statistical techniques / methods that should be
> applied here. E.g. I have looked at Principal Component Analysis; this
would
> be a good method to remove the redundant variables and hence reduce some
the
> amount of data that needs to be processed. Can anyone suggest any other
> sensible statistical anaysis methods?
>     - What are the ideal tools / software to perform the clustering /
> classification. I have access to SPSS software but I have never used it
> before and am not really sure how to apply it or whether it is any good
when
> dealing with 100s of variables.
>
> So far I have been analysing each variable on its own 'by eye' by plotting
> the mean and sd for all music files. However this approach is not feasible
> in the long term since I am dealing with such a large number of variables.
> In addition, by looking at each variable on its own, I do not find
clusters
> / patterns that are only visible through multivariate analysis. If anyone
> can recommend a better approach I would be greatly appreciated.
>
> Any help or suggestion that can be offered will be greatly appreciated.
>
> Many Thanks!
>
> Rishabh Gupta
>
>




=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
                  http://jse.stat.ncsu.edu/
=================================================================

Re: Statistics Tool For Classification/Clustering

Reply via email to