Like I said, there are other users with far more knowledge in this area... ;-)
Steve >>> On 9/17/2008 at 2:22 PM, in message <[EMAIL PROTECTED]>, Stephen Woodbridge <[EMAIL PROTECTED]> wrote: > This is one approach to the problem, but it does not deal with the real > problems of matching user entered addresses with addresses encoded on > street segments. > > For example: matching AL 44, Alabama 44, AL-44, Alabama Highway 44, > Highway 44, State Highway 44, Rt 44, and various other abbreviations for > Highway, simple typo errors, adding N, N., North, S, S., South, etc > designations to the Highway, adding Alt., Bus., Byp., etc and on it > goes. You also need to deal with accented characters, that are sometimes > entered without accents. > > In a geocoder, you typically have a standardizer that sort our all that > craziness. Then when you load the geocoder, you standardize the vendor > data and store it in a standard form. When you get a geocode request you > standardize the incoming request and then try to match the standard form > with the vendor data which is also in standard form. > > You can also you techniques like metaphone/soundex codes to do fuzzy > searching and then use levensthein distance to score the possible > matched results for how close they are to the request. > > You need to be prepared to handle multiple results to a query, for > example you search for Oak St. but only find North Oak Street and South > Oak Street. > > Also what are you going to search? your whole dataset, or are you also > going to want to filter it by City, state, postal code, country. I thins > case you also need to be able to parse the full address into al these > additional terms. and filter your search to those appropriate to that > limited region. > > It makes much more sense, to load the appropriate data records into a > relational database and make the queries in SQL. If you do not want to > use a full blown database like Postgresql or Mysql, then look at SQLite > which is a wonderful embedded database with zero management and had > binding for C, Perl, Python, PHP, TCL, etc. > > My two cents, from someone that has his head and fingers in too many > geocoders. > > -Steve > http://imaptools.com/ > > Steve Lime wrote: >> I think we'd need fuzzy match operator, probably one specific to address >> matching. This would involve adding a C function(s) to compare to addresses >> strings and then tweaking the MapServer yacc grammar to recognize the >> new operator. The trick would obviously to write the C function and there > are >> folks on the list with considerable experience with that problem. >> >> If you HAD that operator then presumably you could write different filters >> depending on your data, e.g.: >> >> ('user entered address' addreq '[address column]') or >> ('user entered address' addreq '[prefix] [name] [type] [suffix]') >> >> That would be faster than trying to manipulate the current operators. You > could >> also do a very generic query, like a case insensitive lookup on the street > name >> and then operate on that result set in your application to deal with data >> differences. >> >> Steve >> >>>>> "Emerson, Gabe" <[EMAIL PROTECTED]> 09/17/08 10:16 AM >>> >> Hi All, >> >> I have an interesting mini-project which some of you might have dealt >> with before, I'd be interested in any suggestions. >> >> I'd like to run a query (presented to users as an address search), >> across multiple layers. For example, after an address is entered, the >> system first searches an in-house dataset, if there are no matches it >> searches a county parcel dataset, and if both fail, it tries to map the >> address via a geocoding API. >> >> The issue I'm running into is that each of the layers stores addresses a >> little differently. The in-house set tends to be sloppy about >> punctuation and things like directions ('N' vs 'North', 'St' vs 'ST.' vs >> 'Street', etc). The county is more standardized but breaks everything >> up into street prefix, name, type, suffix, etc. (Minnesota Met Council, >> for those of you familiar with it). In addition, users tend not to enter >> addresses the same way twice, and to leave out things like the street >> type and direction. >> >> I'm wondering if there's a way to relax the query matches so that >> something like "100 James" will return a match from a DBF containing >> "100 South James Ave", or a set of columns like "100" "S" "James" "Ave". >> Something along the lines of The Geocoding API is flexible in this way, >> so one solution I considered is to use it as an address parser and then >> use the returned X,Y data for an itemquery on each of the layers. The >> problems with that are slower performance and possible API >> unavailability. >> >> Currently I'm using Mapserver in CGI mode with some Javascript for >> frontend logic and custom tools. I developed the application this way >> for various reasons, but am considering moving to PHP Mapscript for >> better performance. If something like this is possible with the CGI >> approach I'd love to hear about it, but I'd also be interested in >> mapscript ideas or examples. >> >> Thanks! >> >> -Gabe >> >> Gabe Emerson >> Research Department >> Welsh Companies >> 4350 Baker Road, Suite 400 >> Minnetonka, MN 55343-8695 >> 952-897-7700, ext. 1306 >> [EMAIL PROTECTED] >> >> _______________________________________________ >> mapserver-users mailing list >> [email protected] >> http://lists.osgeo.org/mailman/listinfo/mapserver-users _______________________________________________ mapserver-users mailing list [email protected] http://lists.osgeo.org/mailman/listinfo/mapserver-users
