[
https://issues.apache.org/jira/browse/MATH-1330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Artem Barger updated MATH-1330:
-------------------------------
Description:
Currently *KMeansPlusPlusClusterer* class require from generic parameter *T`*
to extend from *Clusterable* interface, which is:
bq. public interface Clusterable {
double[] getPoint();
}
i.e. returns dense representation of the clusterable data, hence making it
impossible to efficiently compute kmeans clustering on big dimensional, but
very sparse data. I think it will be much better if *Clusterable* interface
will return a *Vector* allowing usage of *SparceVector*s while clustering the
data. Of course *KMeansPlusPlusClusterer* implementation and I assume other
clustering implementations should be refactored accordingly to support this.
was:
Currently *KMeansPlusPlusClusterer* class require from generic parameter *T`*
to extend from *Clusterable* interface, which is:
bq.
public interface Clusterable {
double[] getPoint();
}
i.e. returns dense representation of the clusterable data, hence making it
impossible to efficiently compute kmeans clustering on big dimensional, but
very sparse data. I think it will be much better if *Clusterable* interface
will return a *Vector* allowing usage of *SparceVector*s while clustering the
data. Of course *KMeansPlusPlusClusterer* implementation and I assume other
clustering implementations should be refactored accordingly to support this.
> KMeans clustering algorithm, doesn't support clustering of sparse input data.
> -----------------------------------------------------------------------------
>
> Key: MATH-1330
> URL: https://issues.apache.org/jira/browse/MATH-1330
> Project: Commons Math
> Issue Type: Improvement
> Reporter: Artem Barger
>
> Currently *KMeansPlusPlusClusterer* class require from generic parameter *T`*
> to extend from *Clusterable* interface, which is:
> bq. public interface Clusterable {
> double[] getPoint();
> }
> i.e. returns dense representation of the clusterable data, hence making it
> impossible to efficiently compute kmeans clustering on big dimensional, but
> very sparse data. I think it will be much better if *Clusterable* interface
> will return a *Vector* allowing usage of *SparceVector*s while clustering the
> data. Of course *KMeansPlusPlusClusterer* implementation and I assume other
> clustering implementations should be refactored accordingly to support this.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)