[Dev] Re: Chandler Internationalization .6 Specification is ready for review

Brian Kirsch Wed, 20 Jul 2005 14:40:48 -0700

Thanks for the feedback Ken. Please see my comments inline.


Ken Krugler wrote:

Hi Brian,
Sorry for not doing a quick, full review. Some issues I thought ofwhile quickly reading over the Wiki page:
1. Lucene has support (tokenizers, stemmers, etc) for variouslanguages, but you'd need to be able to include these (as needed), andalso "know" which language is being processed to decide whichlanguage-specific plugins to apply.

Any specific under the hood Lucene questions would need to be answeredby Andi Vajda who is the owner of PyLucene.

2. Related is the issue of using ICU to do searches inside of text,versus indexed queries. I thought that was something you were going tosupport in Chandler, right? Like I've got an email open, and I searchon some word.
If you're doing this, then you want language-specific, folded (e.g.case insensitive) searching. ICU supports this, but it would requireadditional work I think, similar to Lucene.

I believe as long as the attributes have indexText=True that PyLucenewill handle this case no problem. I have sent a mail to Andi to confirmmy assumption.

3. So along these lines, how do you "pick" the language, if it's notspecified? Sometimes you know the language from meta info (like on webpages), but otherwise it seems like you'll probably just want to usethe user's OS language setting. There are other approaches that try todetect the language, similar to charset detection, but that typicallyisn't warranted for a general-purpose app like Chandler. Anyway Ithink this should be called out as a design decision.

Yes the locale set will come from the Operating System. Althoughmentioned already briefly in the spec I have added an explicit sectiondetailing how the locale set is determined. Thanks for the suggestion.

4. To ensure smooth interoperability with ICU, I assume thatChandler's Python will always be built using UTF-16, not UTF-32,right? Otherwise it seems like you won't be able to leverage directcopying of data between Python and ICU strings.

In the swig code for PyICU, Andi checks the Python unicode objects type(UCS-2 or UCS-4) when converting to and from ICU UnicodeStrings.

5. We'd talked about how big ICU code/data can be, and the need tosupport installations of different language sets. Was that covered?

Yes it is big. I added a note to the spec that ICU size cansignificantly be reduced by removing locale data files such as Hebrewand Arabic which will not be supported in the Chandler 1.0 release.

6. I think somebody commented about the problems that can be caused bytranslators messing up strings. You'd responded w/info about the ICUmessage format. We'd talked about being able to do a consistencycheck, comparing English to language X and validating that theabstract structure of the message (number/type of parameters) hadn'tchanged. Might be worth mentioning.

I added the consistency checker to the spec.

7. For doing a programmatic localization, you mentioned "Potentialtests are double the size of the LocalizableString text or insert ineach LocalizableString translation a non-8bit surrogate characterpair". I'm not sure what you mean by a non-8bit surrogate character pair.
Some tests you can do are:
a. Replace vowels with vowel + umlaut (Motley Crüe localization).Other substitutions are possible as well (C -> Ç, etc)
b. Replace ASCII with full-width ASCII

Update the .6 spec to be more clear. When I stated non-8 bit surrogatecharacter pair what I really meant was a Unicode surrogate characterpair where a single displayable Glyph is represented by two or moreUnicode codepoints such as your example above of Motley Crüe which is au + a umlaut.

equivalents. so "help" becomes "ÇàÇÖÇåÇê", which also tests expandingthe width of text.
8. Do you mention the issue of making gettext use native OS fallbacksettings?

The OS locale set will be determined by the Chandler I18nManager. Thegettext api has built in fallback support. Passing it a locale set arrayis all gettext needs to perform the correct fallback behavior.

Related to this might be noting that using .po files might precludesome Mac OS X localization customization by end users, since thefile/structure won't match what's standard for Mac apps.



Added a footnote to the spec addressing this point.


Anyway, it's 9pm so I'm off to put my daughter to bed. Hope this helps...

-- Ken

--

Ken Krugler
TransPac Software, Inc.
<http://www.transpac.com>
+1 530-470-9200




--
Brian Kirsch - Email Framework Engineer
Open Source Applications Foundation
543 Howard St. 5th Floor
San Francisco, CA 94105
(415) 946-3056
http://www.osafoundation.org

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Open Source Applications Foundation "Dev" mailing list
http://lists.osafoundation.org/mailman/listinfo/dev

[Dev] Re: Chandler Internationalization .6 Specification is ready for review

Reply via email to