I want to be able to do an overpass query for Iceland where name= field
contains non-Icelandic characters. These could be for example Chinese,
Cyrillic or even other European characters (such as âà for example). I'm
guessing it could be difficult for the latin characters but hopeful it
would be easier for non-latin alphabets.

Is there a magic formula for achieving this?

I suggest, as a refinement of Ilya's query, this one:
http://overpass-turbo.eu/s/lCk

As it may help for other languages, I explain how I got to this:

1. Start with

area["name:en"="Iceland"];
node(area)[name];
out count;

This is basically an all-nodes-in-Iceland-with a name. The important part is the "out count". This assures that you are not flooded with results. For the same reason it is enough to start with nodes: We do not want a final result now. But we want to create a senstive search term. For this reason, we will even get down to just a subset of all nodes in a second.

2. Clamp down to

area["name:en"="Iceland"];
node(area)[name~"[^a-zA-Z]"];
out count;

These are all nodes that contain at least one character different from a latin letter. These are still many. Therefore:

3. Get examples with

area["name:en"="Iceland"];
node(area)[name~"[^a-zA-Z]"];
out 100;

This prints some random 100 results (in fact: the 100 matches with lowest node id). Now we can look at the name fields and get an idea what we would like to exclude in addition.

4. Start to narrow down with

area["name:en"="Iceland"];
node(area)[name~"[^a-zA-Z0-9 ]"];
out 100;

Spaces and digits are OK even before we start to accept all the special characters from Icelandic.

This process is now repeated until the sample contains no more false positives. Finally, we expand this to all three types of OSM elements, in the expectation that not much false positives appear.

Cheers,

Roland


_______________________________________________
dev mailing list
dev@openstreetmap.org
https://lists.openstreetmap.org/listinfo/dev

Reply via email to