On Thu, Jun 08, 2017 at 05:58:07PM +0200, Patrick Marais wrote: > HI Ryan, > > Thanks for the quick reply and suggestions. > > I suspect you're right about the neighbor query getting too many points; I > tried reducing it massively, and now it runs for about 15 minutes > and the seg faults. See below - The number of points is a bit higher than > I stated: 542326, but the input is an arma::mat with this number of columns > and rows=63. I'm fairly sure I have checked everything to remove NaN's and > so on. Could it be possible that the size of the data set is causing > something to fail? The memory usage was not maxed out at this point (only > about 2.5GB over 8GB). > > I just ran it through gdb, which doesn't seem to have all the debug > information, so besides the place at which it crashed, I can't say much > else. > > Not sure what to try next. If I remove the cal to dbscan and use Kmeans, > everything works (although the clusters are not what I'd really like).
Hmm, not sure exactly what the issue is here. You may have uncovered a bug. Is there any chance I can get the dataset to try and reproduce the failure? Another sanity check would be to try an even smaller epsilon; if it's still taking 15 minutes, then it still may be finding very many points for each range search. Clustering is a hard problem, and there's definitely a big tradeoff between "fast and bad" (k-means lives here) and "slow but good" (maybe you could say this about DBSCAN, but even DBSCAN is faster than some things like spectral clustering methods). -- Ryan Curtin | "Bye-bye, goofy woman. I enjoyed repeatedly [email protected] | throwing you to the ground." - Ben Jabituya _______________________________________________ mlpack mailing list [email protected] http://knife.lugatgt.org/cgi-bin/mailman/listinfo/mlpack
