We already support sparse vectors and matrices. That should be pretty much all you need.
There is emerging support for SVM and on-line logistic regression. A little less mature is support for very large scale SVD which would give you a reasonable basis for clustering, or categorization. On Wed, May 5, 2010 at 6:29 AM, Pedro Oliveira <cpdom...@gmail.com> wrote: > From a quick look at the code, a straightforward solution would be to > define > a new type of Vector (it wouldn't be a vector in the mathematical sense, > just a way to save relational information about an instance), and some > DistanceMeasures to work with that vector. Then we could use distance based > techniques, such as canopy clustering and k-means. > Is there any plans to implement more distance-based (or kernel-based) > algorithms, such as SVMs and KNN? >