Hi, Really, this is not only a spell checking problem. OpenOffice.org has problems with both of visual and functional equivalence of Unicode characters. For example, here is the result of the Find all ä operation on ÄÄää, i.e. on the "A U+0308 (COMBINING DIARESIS) Ä a U+0308 ä" character sequence: http://www.flickr.com/photos/85171...@n00/3170574450/
It would be fine to solve this problem in the future OpenOffice.org versions by automatic Unicode normalization, also by OpenType support. Hunspell 1.2.x (I hope, it will be in OOo 3.1) has a temporary solution for Unicode normalization (canonical and compatiblity), the optional input/output conversion: ICONV 4 ICONV Ä Ä ICONV ä ä ICONV 가 ᄀ ᅡ ICONV fi fi First three conversion is canonical normalization: two composition and a Hangul decomposition. Conversion of the fi ligature is a compatibility normalization (but spell checking of words with f-ligatures needs fixed word breaking in OOo, too). Conversion of the spell checking suggestions to the original composed form: OCONV 2 OCONV ᄀ ᅡ 가 OCONV fi fi (Special spell checking requirements needs special solution. For example, German typography uses only f-ligatures within words, bot not in compound word boundary, so the previous OCONV fi fi conversion is not right for German. A redundant dictionary with non-suggested decomposed forms, and dictionary words with ligatures helps to check the correct typography of a German text: --- affix file --- NOSUGGEST * REP 2 REP fi fi REP fi fi --- dictionary file ---- finden/* finden ) Hyphenation of both of composed and decomposed characters is possible in OOo by redundant hyphenation patterns in OpenOffice.org. Compatibility equivalent ligatures can be handled by non-standard hyphenation (alternations): fi1/f=i,1,1 For thesauri it is a temporary solution using redundant items or references: finden->finden Incoming stemming in OOo thesaurus by Hunspell is also can handle normalization problem temporarily. ICONV input conversion or explicit stems ( --- dic file --- finden st:finden ) can give the normalized stems to the thesaurus component. Maybe a new Hunspell tool could help the spelling dictionary developers by the automatic generation of the ICONV normalization table. Regards, László 2009/1/5 Stephan Bergmann <[email protected]>: > On 01/02/09 09:51, F Wolff wrote: >> >> Hallo all >> >> We recently had a discussion on a list for African localisation about >> the utility of having Unicode normalisation automatically done in >> Hunspell, so that creators of spell checkers wouldn't need to worry >> about that. >> >> Is this a feature that would be useful to more people? Is there >> something generic in OOo that handles normalisation issues for other >> purposes? (searching, thesaurus, indexes, etc.) I can think of many >> places where it could be relevant. >> >> I'm curious to hear what other people think. > > I brought this up years ago as point 4 of > <http://www.openoffice.org/servlets/ReadMsg?list=dev&msgNo=7099>, but > nothing became of it back then... > > -Stephan > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > >
