On 30/3/2013 12:32 μμ, Dimitry Sibiryakov wrote:
30.03.2013 11:27, m. Th. wrote:
Do you actually read the source?
    No, I actually learn languages. So, I have no idea how to transliterate Russian 
"щ" or
"ы" or Czech "ř".


There is the "UNITED NATIONS GROUP OF EXPERTS ON GEOGRAPHICAL NAMES (UNGEGN)"
(sorry for the all caps, it was a copy / paste)

which gives Romanization tables which are quite clear for enough alphabets and 
easy to implement in code.

See

http://www.eki.ee/wgrs/status.htm

Concrete, for Russian you have http://www.eki.ee/wgrs/rom1_ru.htm

Also, let us not forget that we-re talking here about a /phonetic/ engine which outputs an _aggregate_ code, hence in almost all cases we can ignore (at least for the beginning) the subtile differences between sounds (for ex. see the use of aphostrophes in the Georgian language at http://www.eki.ee/wgrs/rom2_ka.htm).

(One of) the main problem(s) in the design phase are the latin alphabets which aren't already covered by the existent Double Metaphone codebase.

Such alphabets, especially for our case, are generally easy to process in order to achieve our goal, even if the scientific approach vs. special letters etc. is sometimes somewhat daunting for an non-informed. For ex. the Romanian alphabet, while in the general, scientific approach is somewhat complicated (see http://en.wikipedia.org/wiki/Romanian_alphabet ), for our purpose is quite easy to build the cases / phonetic conversion table.

Hence, the most recommended approach is to implement the UN standard where it exists and where it is worth it (for ex. I don't know if is of high importance to implement the romanization for Tigrinya, Lao or Urdu - no offence intended), and for other languages, if someone with knowledge of the languages can provide the conversion tables for the most widespread Latin-derived alphabets from...

http://en.wikipedia.org/wiki/Category:Latin_alphabets

...it would be a plus, even if we could use Wikipedia for this. From what I see at the above list, and from the Double Metaphone code it seems that only the Eastern-European alphabets are not covered by Metaphone, need and, hence, are in a rather good position. For the Romanian language, I (or Mariuz) can provide a translation table with ease. I don't know if for other EE languages are other takers...

Thoughts? Comments?

Ioan Th.

------------------------------------------------------------------------------
Own the Future-Intel® Level Up Game Demo Contest 2013
Rise to greatness in Intel's independent game demo contest.
Compete for recognition, cash, and the chance to get your game 
on Steam. $5K grand prize plus 10 genre and skill prizes. 
Submit your demo by 6/6/13. http://p.sf.net/sfu/intel_levelupd2d
Firebird-Devel mailing list, web interface at 
https://lists.sourceforge.net/lists/listinfo/firebird-devel

Reply via email to