Artem Barger created MATH-1330:
----------------------------------

             Summary: KMeans clustering algorithm, doesn't support clustering 
of sparse input data.
                 Key: MATH-1330
                 URL: https://issues.apache.org/jira/browse/MATH-1330
             Project: Commons Math
          Issue Type: Improvement
            Reporter: Artem Barger


Currently `KMeansPlusPlusClusterer` class require from generic parameter `T` to 
extend from `Clusterable` interface, which is:
```
public interface Clusterable {

    /**
     * Gets the n-dimensional point.
     *
     * @return the point array
     */
    double[] getPoint();
}
```
i.e. returns dense representation of the clusterable data, hence making it 
impossible to efficiently compute kmeans clustering on big dimensional, but 
very sparse data. I think it will be much better if `Clusterable` interface 
will return a `Vector` allowing usage of `SparceVector`s while clustering the 
data. Of course `KMeansPlusPlusClusterer` implementation and I assume other 
clustering implementations should be refactored accordingly to support this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to