On 27/03/2008 09:07, Stefan Baebler wrote: > On Thu, Mar 27, 2008 at 12:55 AM, David Earl <[EMAIL PROTECTED]> wrote: >> On 26/03/2008 19:31, OJ W wrote: >> > Is it doing anything with the multilingual names in OSM (name:de=... and >> > similar)? >> Yes, that should have been on my list in the previous message. >> This always was included. e.g. try searching for Cologne and Köln (or Koln) > > Venice in Italy has name:sl=Benetke > Italy has name:sl=Italija > > Searching either for "benetke" or "italija" works as expected. > searching for "benetke, italija" (sl, sl) fails miserably - returning > empty page.
I suspect the empty page was because the search timed out. It should have said no matches I guess. > However searching for "benetke, italy" (sl, en) or "benetke, italia" > (sl, it) works flawlessly. > > how exactly is the context determined? > a) is_in tag on the node, thus requiring additional (imo redundant) > is_in:sl="Italija, Evropa" tag on Venice > b) closeness of the nodes (might be ok for cities, but countries vary > a lot in size) > c) inclusion in a context polygon (country, city) The name finder wiki entry outlines the method: http://wiki.openstreetmap.org/index.php/Name_finder There are two different kinds of search: unqualified and qualified (the latter with a comma or "near". In the first case, the name is just looked up, with variations. In the second it looks for the qualifying place (place=city,town,small_town,village,suburb or hamlet) and then searches for the bit before the comma "close to" the place or places found. Searches can also be further qualified using is_in (NOTE: only is_in - I don't recognize is_in:lang, as its the first I've heard of it). For a full search that's two commas: "Hinton Road, Fulbourn, UK". If the is_in qualified search fails, though, it tries again without the qualifier (because there are so many is_in's missing or inaccurate). The reason "benetke, italy" or "benetke, italia" work though is because they aren't quite working "flawlessly". What these are actually doing is looking for "benetke" near a place called "italy" which doesn't exist as a place (settlement, as above), so it then tries a the search again without qualification. So what you;'re seeing is ther same as if you simply searched for "benetke". > Similar problem is with "dunaj, avstrija" (sl, sl) = "vienna, austria" > (en, en) = "wien, osterreich" (de, de) > > Perhaps matches in the same language should be ranked higher than > matches fro mmixed languages > > Bigger entities should also be ranked higher (continent > country > > city > town > village > street ...) if no difference in context is > found Indeed, and they are. But countries and regions are not places (settlements) for the purposes of searching. You are searching "near to" a place, not "within a" country - as you say The country is too big for this to work at all reliably. Having said all that, I think I can improve this: in particular with respect to language variations in country names (the information needs to come from somewhere if it isn't in alternate is_in forms though), and to interpret the form "a, b" as 'place is_in country' as well as 'object near place' > Try searching for "europe" or "austria". For the latter unique > "avstrija" (sl) gives far better results :) That's because though I don't use the country as a qualifier, it is still a node with a name that gets put in the general search index. You'll presumably get back "Austria Street" and "National Gallery of Austria" in the same search as well but Austria will come first because it is an exact match while the others have additional words. Osterriech will also work, I presume, because the node is name=O[umlaut]sterreich and name:en:Austria > Performance is much better than yesterday! You probably were using it yesterday while it was still construting the index update, which went on until about 10am. Today it failed (the machine ran out of memory) and I restarted it about 9am; I expect it will continue for some time: it's on 20% at present though the early stages often take the longest (because there are so many "1st street" entries and the like). David _______________________________________________ talk mailing list talk@openstreetmap.org http://lists.openstreetmap.org/cgi-bin/mailman/listinfo/talk