>>>>> "Markus" == Markus Preisetanz <[EMAIL PROTECTED]> >>>>> on Thu, 26 Jan 2006 20:48:29 +0100 writes:
Markus> Dear R Specialists, Markus> when trying to cluster a data.frame with about 80.000 rows and 25 columns I get the above error message. I tried hclust (using dist), agnes (entering the data.frame directly) and pam (entering the data.frame directly). What I actually do not want to do is generate a random sample from the data. Currently all the above mentioned cluster methods work with full distance / dissimilarity objects, even if only internally, i.e. they store all d_{i,j} for 1 <= i < j <= n, i.e. n(n-1)/2 values, also each of them in double precision, i.e. 8 bytes. So: no chance with the above functions and n=80'000 Markus> The machine I run R on is a Windows 2000 Server (Pentium 4) with 2 GB of RAM. If you would run an machine with a 64-bit version of OS and R {typical case today: Linux on AMD Opteron}, you could go up quite a bit higher than on your Windoze box, {I vaguely remember I could do 'n = a few thousand' on our dual opteron with 16 GBytes}, but 80'000 is definitely too large. OTOH, there is clara() in the cluster package, which has been designed for such situations, CLARA:= [C]lustering [LAR]ge [A]pplications. It is similar in spirit to pam(), *does* cluster all 80'000 observations but does so by taking sub samples to construct the medoids. (and you can ask it to take many medium size subsamples, instead of just 5 small sized ones as it does by default). Martin Maechler, ETH Zurich maintainer of "cluster" package. ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html