Re: [OSM-dev] Querying for non-native characters in name field

Roland Olbricht Tue, 31 Jan 2017 08:42:07 -0800

I want to be able to do an overpass query for Iceland where name= field
contains non-Icelandic characters. These could be for example Chinese,
Cyrillic or even other European characters (such as âà for example). I'm
guessing it could be difficult for the latin characters but hopeful it
would be easier for non-latin alphabets.


Is there a magic formula for achieving this?


I suggest, as a refinement of Ilya's query, this one:
http://overpass-turbo.eu/s/lCk

As it may help for other languages, I explain how I got to this:

1. Start with

area["name:en"="Iceland"];
node(area)[name];
out count;

This is basically an all-nodes-in-Iceland-with a name. The importantpart is the "out count". This assures that you are not flooded withresults. For the same reason it is enough to start with nodes: We do notwant a final result now. But we want to create a senstive search term.For this reason, we will even get down to just a subset of all nodes ina second.


2. Clamp down to

area["name:en"="Iceland"];
node(area)[name~"[^a-zA-Z]"];
out count;

These are all nodes that contain at least one character different from alatin letter. These are still many. Therefore:


3. Get examples with

area["name:en"="Iceland"];
node(area)[name~"[^a-zA-Z]"];
out 100;

This prints some random 100 results (in fact: the 100 matches withlowest node id). Now we can look at the name fields and get an idea whatwe would like to exclude in addition.


4. Start to narrow down with

area["name:en"="Iceland"];
node(area)[name~"[^a-zA-Z0-9 ]"];
out 100;

Spaces and digits are OK even before we start to accept all the specialcharacters from Icelandic.

This process is now repeated until the sample contains no more falsepositives. Finally, we expand this to all three types of OSM elements,in the expectation that not much false positives appear.


Cheers,

Roland


_______________________________________________
dev mailing list
dev@openstreetmap.org
https://lists.openstreetmap.org/listinfo/dev

Re: [OSM-dev] Querying for non-native characters in name field

Reply via email to