Here is an idea, I'm not going to code this one:) It's still not an ideal solution because it has to make assumptions about your data set. Execute the algorithm I outlined previously with a very small r value, if you didn't find the number of points you are looking for, increase r and modify the query slightly so it doesn't return any of the points the first query returned .... something like AND `PointID` NOT in ('34', '56', '67', . . .). At every step along the way insert the point id of the points inside of r along with the distance they are from the test point, once you have over 100 records in this table stop increasing r and query the temp table sorted by distance with a limit of 100. Of course you have to have some knowledge of your data set to get a reasonable start value for r and a reasonable method for determining how much to increase it each time.

On the other hand a minor modification seems better. By inserting all the points in the cube along with their distance in the temp table, a query like SELECT count(*) FROM temp WHERE `Distance` <= r Would be a good way to see if you need to continue to the next round. Also doing it that way, instead of using the NOT IN syntax, which I understand can be slow, you can modify the where condition to find points that are inside the current cube of size r but are outside the previous cube.

Chris W

Werner Van Belle wrote:
Hello Chris,

The use case I'
m talking about is actually a typical usecase for GIS applications: give
me the x closest points to this one. E.g: give me the 10 points closest
to (1,2,79) or in my case: give me the 100 points closest to
(x1,....x96). A query like yours might be possible and might be a good
solution if we would know the radius in which we are looking for the
points, but this is not really the case: we merely want a list returned
ordered by distance. Solving this with your solution is possible but is
quite slow. There exists nice datastructures to deal with this type of
problem as said and these are used in the GIS implementation in MySql.

Chris W wrote:
I'm not sure why, but it seems that some people, I don't mean to imply
that you are one of them, think there is some magic MySQL can preform
to find points with in a given radius using the GIS extension.  There
is no magic.  They simply use the well known math required to
determine what points are inside the circle.
GIS extenstions are also not only about distances: the above query is
better solved with specialized datastructures.
I could be wrong but I doubt there is any way to create an index that
can directly indicate points with a a certain distance of other points
unless that index included the distance from every point to every
other point.  That is obviously not practical since with a set of only
14 points the index would have over 6 billion entries.
Partitioning of the space such as done in 3D render engines do solve
this problem more efficiently than having a list of all pairtwise
distances.  So the question is not whether such algorithms exist, it is
rather whether they are available in/through MySql.

lets call each of your dimensions d1, d2, d3 .... d96. If you create
an index on d1, d2, .... d69, you can then create a simple query that
will quickly find all points that will find all points that are with
in a bounding box.  Since this query is going to get a bit large with
96 dimensions, I would use code to create the query. I will use php. Let's start with the desired radius being r and the test point
dimensions being in an array TestPointD[1] = x, TestPointD[2] = . . .

$select = 'SELECT `PointID`, ';
$where = 'WHERE ';
foreach($TestPointD as $i => $d){
 $di = 'd' . "$i";
 $select .= "`$di`, "
 $MinD = $d - $r;
 $MaxD = $d + $r;
 $where .= "`$di` >= '$MinD' AND `$di` <= '$MaxD' AND ";
}
$select = substr($select, 0, -2);  //trim of the trailing comma and space
$where = substr($where, 0, -4);  //trim off the trailing 'AND '

$query = "$select FROM `points` $where";

Thanks for the nice illustration. In this case with the proper indices
this will indeed split the space in sections; nevertheless this approach
has great difficulties returning an ordered list of distances and
prefereably only the 100 closest ones at that.

Wkr,


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/mysql?unsub=arch...@jab.org

Reply via email to