On Friday 28 March 2003 01:21 pm, Keld J�rn Simonsen wrote: > On Fri, Mar 28, 2003 at 11:32:21AM -0800, H. Peter Anvin wrote: > > Followup to: > > <[EMAIL PROTECTED]> By > > author: Tomohiro KUBOTA <[EMAIL PROTECTED]> In > > newsgroup: linux.utf8 > > > > a) It needs to be easy to write internationalized and > > multilingualized applications. > > > > b) Programmers need to be taught that it is easy, and how to > > do it.
Hear, hear. > > When it comes to (a), it pretty much means that the > > complexity needs to be hidden from the application > > programmer. Terminal applications, toolkits, and perhaps > > libraries like readline need to support this, but > > applications shouldn't need to be affected beyond a few > > basic guidelines, such as don't assume byte == character. > > Getting UTF-8 universally deployed will be a huge part of > > this, because it means that anything other than 7-bit ASCII > > will have to take this into consideration. The biggest missing pieces are input and rendering. IIIMF addresses input. Pango and Graphite address rendering. This stuff needs to be available everywhere without special programming effort. > > We need easy-to-read webpages and easy-to-use libraries how > > to do this, even for monolingual, American programmers who > > might not be using characters outside the US-ASCII set on a > > daily basis. I'll be happy to contribute some of the things I have written on these topics. > > > Of course several Japanese companies are competing in > > > Input Method area on Windows. These companies are > > > researching for better input methods -- larger and > > > better-tuned dictionaries with newly coined words and > > > phrases, better grammartical and semantic analyzers, and > > > so on so on. I imagine this area is one of areas where > > > Open Source people cannot compete with commercial > > > softwares by full-time developer teams. > > > > This seems to call for a plugin architecture. More than > > anything I suspect we need *standards*. IIIMF is intened to become part of LSB. When the Pango and Graphite teams figure out what they want to do, we can discuss making the result a standard, too. > I agree with Kubota-san and Peter, Internationalization should > be inherent in all programs, and even American programmers > should be able to easily write internationalized programs. > > One idea I have had was that strings in programming languages > should automatically be put for translation, unless it is a > constant. > > Is that a scheme that would work? Not in that simplistic form. Programmers frequently compose messages from pieces that fit together in the language and context they are most familiar with, but not in others. Variations in the way languages deal with gender, number, declensions, conjugations, sentence order, polite speech, and other factors interfere. The classic Chomskian example of sentences with the same word order but different structure is the two sentences Time flies like an arrow. Fruit flies like a banana. In English one can say, "Is it here?" or "Is he here?" However, in Japanese those ideas are expressed with distinct verbs, "aru" for the inanimate and "iru" for the animate. My favorite example is the Japanese utterance, "Boku wa, ebi da." After the first few months of studying Japanese, an American would attempt to translate this as, "I am a shrimp." Actually, when spoken to a waiter, it means, "Mine is the shrimp dish." > Could we just do some automated tools to mark every string for > translation via gettext - or would it need further spec, like > getting it thru some standards process? No, we actually need to teach them how to do it right. The details are application-specific. > Best regards > keld -- Edward Cherlin Generalist & activist--Linux, languages, literacy and more "A knot! Oh, do let me help to undo it!" --Alice in Wonderland -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
