Hi,
first, sorry for cross-posting (some of you will receive multiple messages :-().


I'd like to propose a simple gettext extension which would work at least for Serbian, but I hope it would work for many other languages.

*Background:*
Serbian language has 7 declinations of a word (nouns, pronouns, and similar words), in recent discussions on gnome-i18n list I found out that Finnish has 15, etc. This becomes a major problem when translating "composed" strings, as in "move %s", where "%s" might be any of "queen", "king",...


The usual scenario is this (Serbian latin transliteration used for examples):
msgid "queen"
msgstr "kraljica"


msgid "king"
msgstr "kralj"

msgid "move %s"
msgstr "premesti %s"

msgid "go with %s"
msgstr "idi sa %s"

It's unfortunate (or is it?) that we'll get the form of "premesti kraljica" which is incorrect (it ought to be "premesti kraljicu"), or "idi sa kralj" instead of "idi sa kraljem".

The solution is simple, and I guess that it will work for at least all Slavic languages, but probably many more.

*Solution:*
# in the header, 7 is a sample for Serbian
"PO-Number-of-noun-forms: 7\n"

msgid "queen"
msgstr<0> "kraljica"
msgstr<3> "kraljicu"
msgstr<5> "kraljicom"

msgid "king"
msgstr<0> "kralj"
msgstr<3> "kralja"
msgstr<5> "kraljem"

msgid "move %s"
msgstr "premesti %<3>s"

msgid "go with %s"
msgstr "idi sa %<5>s"

<i>, where i=0 .. (PO-Number-of-noun-forms)-1, is the index of the form required, and it depends on the sentence construction. It is determined by the verb, or perhaps words like "with", "whom", ... Some of msgstr<i>'s can be omitted if it's known not to be used in composition (most are highly unlikely to be ever used in translations, like the "vocative" form of "hey %s").


The good side of this approach (the syntactic elements are arbitrary, don't comment on those) is that programs that use gettext for l10n would need no change: everything would be done on the gettext library side and by translators (it's even better than plural-forms in that manner). Of course, care should be taken to allow also combination of these and plural forms, as in:
msgid "king"
msgid_plural "kings"
msgstr[0]<0> "kralj"
msgstr[0]<5> "kraljem"
msgstr[2]<0> "kraljevi"
msgstr[2]<5> "kraljevima"



Before diving into gettext code, it'd be nice to hear if this kind of approach would work for any language other than Serbian (I repeat, I find it likely to work for Slavic languages, and German, those being the languages I'm at least a bit familiar with).


In any case, looking forward to hearing from all of you.


Again, sorry for crossposting, but I just wanted to reach the widest possible audience, so as to get some *real* insight into the problem.


Cheers,
Danilo

--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/



Reply via email to