On 11 April 2017 08:26:14 IST, Rory McCann <r...@technomancy.org> wrote:
>You could try to run the "name" tag though a language detection 
>algorithm and see what comes out. I think Google released one a few 
>years ago: cf. https://github.com/Mimino666/langdetect
>
>Ethnologue has some. But I think it would cost a lot to licence.
>https://www.ethnologue.com/ and is probably much more precise than you
>need.

KDE's Sonnet is another library that springs to mind.


Another approach that might be interesting is to look at nearby objects in osm. 
Look for objects with a clearly-identifiable language (ie if name tag has  same 
value as exatly one of the name:xx tags of the object). If 90% of those 
identify as 'English' for example, then other unidentified languages in the 
same area are probably English too.

To get decent performance, split the world in tiles and figure out the dominant 
clearly-tagged language for each tile. Use that preprocessed data as your 
language-guessing "shapefile".
-- 
Vdp
Sent from a phone.

_______________________________________________
talk mailing list
talk@openstreetmap.org
https://lists.openstreetmap.org/listinfo/talk

Reply via email to