Op Ma, 2009-01-05 om 11:08 +0100 skryf Stephan Bergmann:
> On 01/02/09 09:51, F Wolff wrote:
> > Hallo all
> > 
> > We recently had a discussion on a list for African localisation about
> > the utility of having Unicode normalisation automatically done in
> > Hunspell, so that creators of spell checkers wouldn't need to worry
> > about that.
> > 
> > Is this a feature that would be useful to more people? Is there
> > something generic in OOo that handles normalisation issues for other
> > purposes? (searching, thesaurus, indexes, etc.)  I can think of many
> > places where it could be relevant.
> > 
> > I'm curious to hear what other people think.
> 
> I brought this up years ago as point 4 of 
> <http://www.openoffice.org/servlets/ReadMsg?list=dev&msgNo=7099>, but 
> nothing became of it back then...
> 
> -Stephan

Thank you for your reply, Stephan.

In your mail you ask if it is severe enough. I would think that it is a
relevant problem. Unfortunately, it is probably mostly a problem for
languages that are not usually well represented in the developer
communities. Many African languages have not yet standardised their
keyboard layouts, and for some there are several competing designs. What
this means is that documents could be created with different "encodings"
of the same text, which will make searching not work correctly (unless
proper normalisation is done), as Németh indicated.

While somebody might be able to see certain text is present (instead of
searching), it is unrealistic for spell checker authors to add all
possible ways of writing letters into account in all possible
combinations for each word. In the case of Yoruba, vowels can have zero,
one or two diacritics. This can be represented with one, two or three
code points. As far as I know there are several keyboard layouts for
Yoruba, so this is not a theoretical issue we are describing.

Németh, the ICONV solution sounds interesting, and I guess would work. I
don't know if that would then also work in Firefox. (Do they update
their copy of Hunspell from time to time?) Automatic conversion means
that people would benefit from the normalisation even if the spell
checker authors didn't think about the problem, which is probably ideal.
I can't image there being a very large overhead for this, although it
probably won't come for free either.

Friedel



--
Recently on my blog:
http://translate.org.za/blogs/friedel/en/content/re-bringing-all-translation-management-tools-together


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to