Hi,
in a GeoServer pull request we have an implementation a NearestVisitor
whose objective was to find the closest value to a given reference value in
the domain of a feature attribute whose value was a Number, a Date or
 String:
https://github.com/ilkkarinne/geoserver/blob/master/src/wms/src/main/java/org/geoserver/wms/dimension/NearestVisitorFactory.java

In essence, this is similar to a max/min visitor, and not so similar
instead to
a KNN search (find the N closest features to a given point), in that it
searches
for the closest value, not for the closest features.

I've started backporting that code to Geotools and adding unit tests, a
JDBC optimized
implementation, and then moved to make it more general, so that it could
work
against Geometry and generally speaking Comparable objects as well, with a
small twist there if you want, that if we are dealing wit a Comparable
that's not
a Number or a Date, we end up with the two closest values (unless the
reference is
above or below the domain), and then we basically arbitrarily pick one.

I did not add a String based implementation, but if we had one, it would
probably  have to
use some reliable string distance (e.g. the Levenstein distance for example,
http://en.wikipedia.org/wiki/Levenshtein_distance)

During the implementation I've also noticed that what we are searching a
value in
the attribute domain, which is not necessarily of the same type as the
reference
value (e.g., domain made of integers, with the reference being a double) so
had the code make sure to handle that.
The overall visitor code is here:
https://github.com/aaime/geotools/blob/nearest/modules/library/main/src/main/java/org/geotools/feature/visitor/NearestVisitor.java

The JDBC optimized implementation turns the nearest visit into two visits,
basically finding the closest value to the reference in the half below the
reference (the max in that interval),
and then the closest value to the reference in the half above (the min in
that interval) and then
picks the closest of the two. If the domain is indexed this should be quite
a bit
faster than scanning the whole domain, if not, it's still going to be
faster than loading
all the features from the database in the in memory visitor:
https://github.com/geotools/geotools/pull/386/files#diff-5e45fc0686088e1fbeea65365775a1fbL678

The pull request is here, comments welcomed:
https://github.com/geotools/geotools/pull/386

Cheers
Andrea

-- 
== Our support, Your Success! Visit http://opensdi.geo-solutions.it for
more information ==

Ing. Andrea Aime
@geowolf
Technical Lead

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054  Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 1660272
mob: +39  339 8844549

http://www.geo-solutions.it
http://twitter.com/geosolutions_it

-------------------------------------------------------
------------------------------------------------------------------------------
Subversion Kills Productivity. Get off Subversion & Make the Move to Perforce.
With Perforce, you get hassle-free workflows. Merge that actually works. 
Faster operations. Version large binaries.  Built-in WAN optimization and the
freedom to use Git, Perforce or both. Make the move to Perforce.
http://pubads.g.doubleclick.net/gampad/clk?id=122218951&iu=/4140/ostg.clktrk
_______________________________________________
GeoTools-Devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/geotools-devel

Reply via email to