Hi, thanks for the clarification!
Meanwhile I read also far into the libpostal project and this sounds really cool. > The interesting question > is how well that works when the search terms in Nominatim have > not been normalized with the same algorithm. I came to that very same question. So, do I understand it correctly, that basically the geocoding process would be: 1. Preparation: *.osm/*.pbf —> osm2psql —> address normalization with libpostal into a seperate „OSM-Address-Table“ 2. Geocoding: AddressToGeocode.csv —> libpostal —> simple lookup in „OSM-Address-Table“ I’ll do some tests on that… Regards, Tom Am 29.11.2016 um 23:39 schrieb Sarah Hoffmann <[email protected]>: Hi, On Tue, Nov 29, 2016 at 12:51:11PM +0100, Tom wrote: > But right now I’m doing some tests with pg_trgm. And Sarah, I cannot confirm > so far your comment > > "Trigrams only work with misspellings of a letter or two, they fail > completely when trying to match up abbreviations.“ > > To me the opposite seems true, as you can see in the following examples. > Let’s take this address, as I want to look for it and the way OSM has it > stored and spelled. > > (asked address) (OSM address) > —street: Верещагина ул улица Верещагина > —town: Ханская ст-ца Ханская > —city: Майкоп г городской округ Майкоп > —region: Адыгея Респ Адыгея > > The Nominatim standard query is basically this (for the street): > > select word_id, word_token, word > from word > where word_token = make_standard_name('Ханская ст-ца') > > …and does not return anything. Nominatim's query matching is actually a bit more complex. For each place name in the Database it saves the full name as well as the partial terms (space separated words) that make up the term. For example, for 'улица Верещагина' it will have the full term 'улица Верещагина' and the partials 'улица' and 'Верещагина'. Further 'улица' is abbreviated to 'ул', so that it will match against the full word and the abbreviation later. When searching, Nominatim does a similar thing and matches first against full words and then against partial terms. So, while you won't find 'Верещагина ул' int the word table, Nominatim will still match it correctly because it finds 'Верещагина' and 'ул' and a database entry (in search_name) that contains both words as partials. The real problems start with 'Ханская ст-ца'. Nominatim only has 'Ханская' as a name but no partial for 'ст-ца' or 'станица'. And as the search algorithm never drops terms from the search query(*), it won't return any result. It's true that trigram search can still return a result. The problem is that the similarity is already lower than many false positives where the spelling is similar. Similarity is simply not a good indicator to distinguish between superflous words and spelling differences. (*) Not completely true. It may drop house numbers, but only those. That's where libpostal comes in. It is supposed to normalize your address to something that is compatible with the names used in OpenStreetMap. That includes removing the odd prefix or suffix (like ст-ца), normalizing numbers etc. The interesting question is how well that works when the search terms in Nominatim have not been normalized with the same algorithm. Sarah _______________________________________________ dev mailing list [email protected] https://lists.openstreetmap.org/listinfo/dev

