[
https://issues.apache.org/jira/browse/MATH-917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13583284#comment-13583284
]
Thomas Neidhart commented on MATH-917:
--------------------------------------
Sorry, it did take me some time to review this contributions.
The existing API for clustering algorithms is not really flexible. The actual
distance calculation is provided / performed in the data point class.
With java and its primitive type system, it is difficult to come up with a
generic version, so I think a Distance interface which targets double would be
a good start and sufficient for most of the use-cases:
{noformat}
public interface Distance {
double distance(DataPoint a, DataPoint b)
}
{noformat}
The DataPoint class does not exist yet, we have only the Clusterable interface,
but this is not very useful imho.
The actual Distance implementation may then be provided as input to the
clustering algorithm.
But this is only one aspect, as the whole clustering package should be
refactored:
* have a base ClusteringAlgorithm interface
* pluggable Distance measures
* flexible DataPoint implementation
* flexible Cluster data structure for result (hierarchical vs. flat, centroid
based vs. other)
> More distance measurements are needed in o.a.c.m.stat.clustering.
> -----------------------------------------------------------------
>
> Key: MATH-917
> URL: https://issues.apache.org/jira/browse/MATH-917
> Project: Commons Math
> Issue Type: Improvement
> Reporter: Reid Hochstedler
> Fix For: 3.2
>
>
> Currently only Euclidean distance is used for distance measurement, it would
> be easy to quickly add Manhattan and Chebyshev distance among others.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira