[
https://issues.apache.org/jira/browse/MATH-1367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gilles Sadowski updated MATH-1367:
----------------------------------
Fix Version/s: 4.X
(was: 4.0)
> DBSCAN Implementation does not count the seed point itself as part of its
> neighbors count
> -----------------------------------------------------------------------------------------
>
> Key: MATH-1367
> URL: https://issues.apache.org/jira/browse/MATH-1367
> Project: Commons Math
> Issue Type: Bug
> Affects Versions: 3.6.1
> Reporter: Amol Singh
> Priority: Major
> Fix For: 4.X
>
>
> The DSCAN paper describes the eps-neighborhood of a point as
> https://www.aaai.org/Papers/KDD/1996/KDD96-037.pdf (Page 2)
> Definition 1: (Eps-neighborhood of a point) The Eps-neighborhood of a point
> p, denoted by NEps(p), is defined by NEps(p) = {q ∈ D | dist(p,q)< Eps}
> in other words for all q points that are a member of database D whose
> distance from p is less that Eps should be classified as a neighbor. This
> should include the point itself.
> The implementation however has a reference check to the point itself and does
> not add it to its neighbors list.
> private List<T> getNeighbors(final T point, final Collection<T> points) {
> final List<T> neighbors = new ArrayList<T>();
> for (final T neighbor : points) {
> if (point != neighbor && distance(neighbor, point) <= eps) {
> neighbors.add(neighbor);
> }
> }
> return neighbors;
> }
> "point != neighbor " check should be removed here. Keeping this check
> effectively is raising the minPts count by 1. Other third party QuadTree
> backed DBSCAN implementations consider the center point in its neighbor count
> E.g. bmw-carit library.
> If this is infact by design, the check should use value equality instead of
> reference equality. T extends Clusterable<T> , the client should be able to
> define this behavior.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)