[R] Clustering Large Applications..sort of

2011-08-10 Thread Ken Hutchison
Hello all, I am using the clustering functions in R in order to work with large masses of binary time series data, however the clustering functions do not seem able to fit this size of practical problem. Library 'hclust' is good (though it may be sub par for this size of problem, thus doubly

Re: [R] Clustering Large Applications..sort of

2011-08-10 Thread Thomas Lumley
Try the flow cytometry clustering functions in Bioconductor. -thomas On Thu, Aug 11, 2011 at 7:07 AM, Ken Hutchison vicvoncas...@gmail.com wrote: Hello all,   I am using the clustering functions in R in order to work with large masses of binary time series data, however the clustering

Re: [R] Clustering Large Applications..sort of

2011-08-10 Thread Peter Langfelder
On Wed, Aug 10, 2011 at 12:07 PM, Ken Hutchison vicvoncas...@gmail.com wrote: Hello all,   I am using the clustering functions in R in order to work with large masses of binary time series data, however the clustering functions do not seem able to fit this size of practical problem. Library

Re: [R] Clustering Large Applications..sort of

2011-08-10 Thread Christian Hennig
There is a number of methods in the literature to decide the number of clusters for k-means. Probably the most popular one is the Calinski and Harabasz index, implemented as calinhara in package fpc. A distance based version (and several other indexes to do this) is in function cluster.stats

Re: [R] Clustering Large Applications..sort of

2011-08-10 Thread Christian Hennig
PS to my previous posting: Also have a look at kmeansruns in fpc. This runs kmeans for several numbers of clusters and decides the number of clusters by either CalinskiHarabasz or Average Silhouette Width. Christian On Wed, 10 Aug 2011, Ken Hutchison wrote: Hello all, I am using the