On Mon, Sep 29, 2014 at 6:16 AM, Bruce Momjian <[email protected]> wrote:
> On Fri, Sep 26, 2014 at 10:49:42AM +0400, Alexander Korotkov wrote: > > Does this also fix the identical PostGIS problem or is there > something > > PostGIS needs to do? > > > > > > This patch provides general infrastructure for recheck in KNN-GiST. > PostGIS > > need corresponding change in its GiST opclass. Since PostGIS already > define <-> > > and <#> operators as distance to bounding box border and bounding box > center, > > it can't change their behaviour. > > it has to support new operator "exact distance" in opclass. > > Ah, OK, so they just need something that can be used for the recheck. I > think they currently use ST_Distance() for that. Does it have to be an > operator? If they defined an operator for ST_Distance(), would > ST_Distance() work too for KNN-GiST? > Currently, ST_Distance is a function, but it's no problem to make it an operator with KNN-GiST support. > In summary, you still create a normal GiST index on the column: > > > http://shisaa.jp/postset/postgis-postgresqls-spatial-partner-part-3.html > > CREATE INDEX planet_osm_line_ref_index ON planet_osm_line(ref); > > which indexes by the bounding box. The new code will allow ordered > index hits to be filtered by something like ST_Distance(), rather than > having to a LIMIT 50 in a CTE, then call ST_Distance(), like this: > > EXPLAIN ANALYZE WITH distance AS ( > SELECT way AS road, ref AS route > FROM planet_osm_line > WHERE highway = 'secondary' > ORDER BY ST_GeomFromText('POLYGON((14239931.42 > 3054117.72,14239990.49 3054224.25,14240230.15 3054091.38,14240171.08 > 3053984.84,14239931.42 3054117.72))', 900913) <#> way > LIMIT 50 > ) > SELECT ST_Distance(ST_GeomFromText('POLYGON((14239931.42 > 3054117.72,14239990.49 3054224.25,14240230.15 3054091.38,14240171.08 > 3053984.84,14239931.42 3054117.72))', 900913), road) AS true_distance, route > FROM distance > ORDER BY true_distance > LIMIT 1; > Yeah. It this query 50 is pure empirical value. It could be both too low or too high. Too low value can cause wrong query answers. Too high value can cause lower performance. With patch simple KNN query will work like this query with always right value in LIMIT clause. > Notice the CTE uses <#> (bounding box center), and then the outer query > uses ST_Distance and LIMIT 1 to find the closest item. > > Excellent! Thanks. The main question now is design of this patch. Currently, it does all the work inside access method. We already have some discussion of pro and cons of this method. I would like to clarify alternatives now. I can see following way: 1. Implement new executor node which performs sorting by priority queue. Let's call it "Priority queue". I think it should be separate node from "Sort" node. Despite "Priority queue" and "Sort" are essentially similar from user view, they would be completely different in implementation. 2. Implement some interface to transfer distance values from access method to "Priority queue" node. 3. Somehow tell the planner that it could use "Priority queue" in corresponding cases. I see two ways of doing this: - Add flag to operator in opclass indicating that index can only order by lower bound of "col op value", not by "col op value" itself. - Define new relation between operators. Value of one operator could be lower bound for value of another operator. So, planner can put "Priority queue" node when lower bound ordering is possible from index. Also "ALTER OPERATOR" command would be reasonable, so extensions could upgrade. Besides overhead, this way makes significant infrastructural changes. So, it may be over-engineering. However, it's probably more clean and beautiful solution. I would like to get some feedback from people familiar with KNN-GiST like Heikki or Tom. What do you think about this? Any other ideas? ------ With best regards, Alexander Korotkov.
