On 11 April 2017 08:26:14 IST, Rory McCann <r...@technomancy.org> wrote: >You could try to run the "name" tag though a language detection >algorithm and see what comes out. I think Google released one a few >years ago: cf. https://github.com/Mimino666/langdetect > >Ethnologue has some. But I think it would cost a lot to licence. >https://www.ethnologue.com/ and is probably much more precise than you >need.
KDE's Sonnet is another library that springs to mind. Another approach that might be interesting is to look at nearby objects in osm. Look for objects with a clearly-identifiable language (ie if name tag has same value as exatly one of the name:xx tags of the object). If 90% of those identify as 'English' for example, then other unidentified languages in the same area are probably English too. To get decent performance, split the world in tiles and figure out the dominant clearly-tagged language for each tile. Use that preprocessed data as your language-guessing "shapefile". -- Vdp Sent from a phone. _______________________________________________ talk mailing list talk@openstreetmap.org https://lists.openstreetmap.org/listinfo/talk