[ 
https://issues.apache.org/jira/browse/MATH-1367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amol Singh updated MATH-1367:
-----------------------------
    Description: 
The DSCAN paper describes the eps-neighborhood of a point as 

https://www.aaai.org/Papers/KDD/1996/KDD96-037.pdf (Page 2)
Definition 1: (Eps-neighborhood of a point) The Eps-neighborhood of a point p, 
denoted by NEps(p), is defined by NEps(p) = {q ∈ D | dist(p,q)< Eps} 

in other words for all q points that are a member of database D whose distance 
from p is less that Eps should be classified as a neighbor. This should include 
the point itself. 

The implementation however has a reference check to the point itself and does 
not add it to its neighbors list.

private List<T> getNeighbors(final T point, final Collection<T> points) {
        final List<T> neighbors = new ArrayList<T>();
        for (final T neighbor : points) {
            if (point != neighbor && distance(neighbor, point) <= eps) {
                neighbors.add(neighbor);
            }
        }
        return neighbors;
    } 

"point != neighbor "  check should be removed here. Shouldn't the cluster 
include the point itself in it? Keeping this check effectively is raising the 
minPts count by 1. Other third party QuadTree backed DBSCAN implementations 
consider the center point in its neighbor count E.g. bmw-carit library. 

If this is infact by design, the check should use value equality instead of 
reference equality. T extends Clusterable<T> , the client should be able to 
define this behavior. 


  was:
The DSCAN paper describes the eps-neighborhood of a point as 

https://www.aaai.org/Papers/KDD/1996/KDD96-037.pdf (Page 2)
Definition 1: (Eps-neighborhood of a point) The Eps-neighborhood of a point p, 
denoted by NEps(p), is defined by NEps(p) = {q ∈ D | dist(p,q)< Eps}.

in other words for all q points that are a member of database D whose distance 
from p is less that Eps should be classified as a neighbor. This should include 
the point itself. 

The implementation however has a reference check to the point itself and does 
not add it to its neighbors list.

private List<T> getNeighbors(final T point, final Collection<T> points) {
        final List<T> neighbors = new ArrayList<T>();
        for (final T neighbor : points) {
            if (point != neighbor && distance(neighbor, point) <= eps) {
                neighbors.add(neighbor);
            }
        }
        return neighbors;
    } 

"point != neighbor "  check should be removed here. Shouldn't the cluster 
include the point itself in it? Keeping this check effectively is raising the 
minPts count by 1. Other third party QuadTree backed DBSCAN implementations 
consider the center point in its neighbor count E.g. bmw-carit library. 

If this is infact by design, the check should use value equality instead of 
reference equality. T extends Clusterable<T> , the client should be able to 
define this behavior. 



> DBSCAN Implementation does not count the seed point itself as part of its 
> neighbors count
> -----------------------------------------------------------------------------------------
>
>                 Key: MATH-1367
>                 URL: https://issues.apache.org/jira/browse/MATH-1367
>             Project: Commons Math
>          Issue Type: Bug
>    Affects Versions: 3.6.1
>            Reporter: Amol Singh
>             Fix For: 4.0
>
>
> The DSCAN paper describes the eps-neighborhood of a point as 
> https://www.aaai.org/Papers/KDD/1996/KDD96-037.pdf (Page 2)
> Definition 1: (Eps-neighborhood of a point) The Eps-neighborhood of a point 
> p, denoted by NEps(p), is defined by NEps(p) = {q ∈ D | dist(p,q)< Eps} 
> in other words for all q points that are a member of database D whose 
> distance from p is less that Eps should be classified as a neighbor. This 
> should include the point itself. 
> The implementation however has a reference check to the point itself and does 
> not add it to its neighbors list.
> private List<T> getNeighbors(final T point, final Collection<T> points) {
>         final List<T> neighbors = new ArrayList<T>();
>         for (final T neighbor : points) {
>             if (point != neighbor && distance(neighbor, point) <= eps) {
>                 neighbors.add(neighbor);
>             }
>         }
>         return neighbors;
>     } 
> "point != neighbor "  check should be removed here. Shouldn't the cluster 
> include the point itself in it? Keeping this check effectively is raising the 
> minPts count by 1. Other third party QuadTree backed DBSCAN implementations 
> consider the center point in its neighbor count E.g. bmw-carit library. 
> If this is infact by design, the check should use value equality instead of 
> reference equality. T extends Clusterable<T> , the client should be able to 
> define this behavior. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to