Op Ma, 2009-01-05 om 11:08 +0100 skryf Stephan Bergmann: > On 01/02/09 09:51, F Wolff wrote: > > Hallo all > > > > We recently had a discussion on a list for African localisation about > > the utility of having Unicode normalisation automatically done in > > Hunspell, so that creators of spell checkers wouldn't need to worry > > about that. > > > > Is this a feature that would be useful to more people? Is there > > something generic in OOo that handles normalisation issues for other > > purposes? (searching, thesaurus, indexes, etc.) I can think of many > > places where it could be relevant. > > > > I'm curious to hear what other people think. > > I brought this up years ago as point 4 of > <http://www.openoffice.org/servlets/ReadMsg?list=dev&msgNo=7099>, but > nothing became of it back then... > > -Stephan
Thank you for your reply, Stephan. In your mail you ask if it is severe enough. I would think that it is a relevant problem. Unfortunately, it is probably mostly a problem for languages that are not usually well represented in the developer communities. Many African languages have not yet standardised their keyboard layouts, and for some there are several competing designs. What this means is that documents could be created with different "encodings" of the same text, which will make searching not work correctly (unless proper normalisation is done), as Németh indicated. While somebody might be able to see certain text is present (instead of searching), it is unrealistic for spell checker authors to add all possible ways of writing letters into account in all possible combinations for each word. In the case of Yoruba, vowels can have zero, one or two diacritics. This can be represented with one, two or three code points. As far as I know there are several keyboard layouts for Yoruba, so this is not a theoretical issue we are describing. Németh, the ICONV solution sounds interesting, and I guess would work. I don't know if that would then also work in Firefox. (Do they update their copy of Hunspell from time to time?) Automatic conversion means that people would benefit from the normalisation even if the spell checker authors didn't think about the problem, which is probably ideal. I can't image there being a very large overhead for this, although it probably won't come for free either. Friedel -- Recently on my blog: http://translate.org.za/blogs/friedel/en/content/re-bringing-all-translation-management-tools-together --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
