On 30/3/2013 12:32 μμ, Dimitry Sibiryakov wrote:
30.03.2013 11:27, m. Th. wrote:
Do you actually read the source?
No, I actually learn languages. So, I have no idea how to transliterate Russian
"щ" or
"ы" or Czech "ř".
There is the "UNITED NATIONS GROUP OF EXPERTS ON GEOGRAPHICAL NAMES (UNGEGN)"
(sorry for the all caps, it was a copy / paste)
which gives Romanization tables which are quite clear for enough alphabets and
easy to implement in code.
See
http://www.eki.ee/wgrs/status.htm
Concrete, for Russian you have http://www.eki.ee/wgrs/rom1_ru.htm
Also, let us not forget that we-re talking here about a /phonetic/ engine which outputs an _aggregate_ code, hence in
almost all cases we can ignore (at least for the beginning) the subtile differences between sounds (for ex. see the use
of aphostrophes in the Georgian language at http://www.eki.ee/wgrs/rom2_ka.htm).
(One of) the main problem(s) in the design phase are the latin alphabets which aren't already covered by the existent
Double Metaphone codebase.
Such alphabets, especially for our case, are generally easy to process in order to achieve our goal, even if the
scientific approach vs. special letters etc. is sometimes somewhat daunting for an non-informed. For ex. the Romanian
alphabet, while in the general, scientific approach is somewhat complicated (see
http://en.wikipedia.org/wiki/Romanian_alphabet ), for our purpose is quite easy to build the cases / phonetic conversion
table.
Hence, the most recommended approach is to implement the UN standard where it exists and where it is worth it (for ex. I
don't know if is of high importance to implement the romanization for Tigrinya, Lao or Urdu - no offence intended), and
for other languages, if someone with knowledge of the languages can provide the conversion tables for the most
widespread Latin-derived alphabets from...
http://en.wikipedia.org/wiki/Category:Latin_alphabets
...it would be a plus, even if we could use Wikipedia for this. From what I see at the above list, and from the Double
Metaphone code it seems that only the Eastern-European alphabets are not covered by Metaphone, need and, hence, are in
a rather good position. For the Romanian language, I (or Mariuz) can provide a translation table with ease. I don't know
if for other EE languages are other takers...
Thoughts? Comments?
Ioan Th.
------------------------------------------------------------------------------
Own the Future-Intel® Level Up Game Demo Contest 2013
Rise to greatness in Intel's independent game demo contest.
Compete for recognition, cash, and the chance to get your game
on Steam. $5K grand prize plus 10 genre and skill prizes.
Submit your demo by 6/6/13. http://p.sf.net/sfu/intel_levelupd2d
Firebird-Devel mailing list, web interface at
https://lists.sourceforge.net/lists/listinfo/firebird-devel