-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi list,
I need some advice what would be the best way to localize our own OSM server. Goal: to have one layer for each language for Mapnik and Osmarender and an improved search engine for the planet.osm data. The facts: We have already * a dedicated server to host a planet.osm excerpt of Taiwan with Mapnik set up. * localized name tags in the planet.osm data. Almost all data for Taiwan uses English in the 'name' and 'ref' tags and Chinese in 'name:zh' and 'ref:zh' as well as some other local languages with the appropriate language tags. Now for Mapnik and Osmarender: we want to add a layer for each language code on top of the default rendering layer (which uses English and the normal 'name' and 'ref' tags), which uses the localized 'name:$lang' and 'ref:$lang' tags and the 'name' and 'ref' tags as fallback. How do we do that? Then the search engine: currently it only works well for addresses which use spaces, but not for Han and Hangul scripts. The reason is, that in Chinese, Japanese and Korean, addresses don't have spaces. Instead the levels (Street, Number, Village, City, County, etc.) are distinguished by a specific character. For example (Chinese as used in Taiwan): the address Zhongzheng Rd. in Taipei would be written 台北市中正路 in one string without spaces, where 台北市 stands for "Taipei City" (市 being the character for City) and 中正路 stands for Zhongzheng Rd. (路 being the character for Road). In the planet.osm data we have 'is_in' and 'is_in:zh' tags, where the Chinese version uses the same way to write the address: The road Zhongzheng Rd. in Taipei has the 'is_in:zh' value 台灣台北市 (means Taiwan, Taipei City). Another more complex (but not the most complex) example: A search for 桃園縣八德市介壽路二段325巷1弄1衖 should find the alley '介 壽路二段325巷1弄1衖 (Alley 1-1, Ln. 325, Jieshou Rd. Sec. 2), which has the 'is_in:zh' tag value 台灣桃園縣八德市 (Bade City, Taoyuan County, Taiwan). So, we need to enhance the search engine code to a) not rely on spaces as delimiters b) for Han and Hangul scripts know the correct and possible alternate address schemes c) every possible English transliteration (for example 'Jieshou Rd. Sec. 2' could also be written 'Sec. 2 Jieshou Rd.' and in multiple other ways (Sec. 2, Jie-Shou Rd., etc.) d) spelling variants in English transliteration (for example the road name Zhongzheng Rd. (中正路) can also be written ZhongZheng Rd., Zhong-Zheng Rd., Jhongjheng Rd., Jungjeng Rd. Chung-cheng Rd, and many more). Many municipalities in Taiwan use alternate spelling systems, as there exists no standard but many different ways to transliterate Chinese characters into English. And people's name cards can also contain spelling systems which you won't find on street signs anymore (in my old company, my name card had the street name written as Tzu-You Rd., although the City administration changed the spellings on the road signs to be Ziyou Rd.). It would be great if we could feed those improvements back into the main OSM project, so that the search on the main OSM website also delivers the correct results. So the question is: where is the code we need to enhance and how to coordinate it? I could provide a list of aliases for road names and Chinese characters to classify address patterns (County, City, Village, etc.) and explain the possible address patterns. Cheers Arne - -- Arne Götje (高盛華) <[EMAIL PROTECTED]> PGP/GnuPG key: 1024D/685D1E8C Fingerprint: 2056 F6B7 DEA8 B478 311F 1C34 6E9F D06E 685D 1E8C Key available at wwwkeys.pgp.net. Encrypted e-mail preferred. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIdeqkbp/QbmhdHowRAqDBAKDpktXk1L9axzdpUWF5BEZMatcfswCgtgE6 t0WnMiJjZH8N74IHdlt6w/g= =j6+S -----END PGP SIGNATURE----- _______________________________________________ dev mailing list [email protected] http://lists.openstreetmap.org/cgi-bin/mailman/listinfo/dev

