Spell checker news I have made a full comparison with MySpell 3.2, thanks for Dmitri Gabinski's bugreport. Thanks for Dmitri and other contributors. I will finish Hunspell integration for QA today. There are also OOo 1.1.4 and OOo 2.0 beta 2 UNO modules here for testing.
Laci Hunspell 1.1.0 on Sourceforge: https://sourceforge.net/project/showfiles.php?group_id=143754&package_id=157912 NEWS 2005-09-19: Hunspell 1.1.0 release * complete comparison with MySpell 3.2 (from OpenOffice.org 2 beta) * improved ngram suggestion with swap character detection and case insensitivity ------ examples for ngram improvement (input word and suggestions) ----- 1. pernament (instead of permanent) MySpell 3.2: tournaments, tournament, ornaments, ornament's, ornamenting, ornamented, ornament, ornamentals, ornamental, ornamentally Hunspell 1.0.9: ornamental, ornament, tournament Hunspell 1.1.0: permanent Note: swap character detection 2. PERNAMENT (instead of PERMANENT) MySpell 3.2: - Hunspell 1.0.9: - Hunspell 1.1.0: PERMANENT 3. Unesco (instead of UNESCO) MySpell 3.2: Genesco, Ionesco, Genesco's, Ionesco's, Frescoing, Fresco's, Frescoed, Fresco, Escorts, Escorting Hunspell 1.0.9: Genesco, Ionesco, Fresco Hunspell 1.1.0: UNESCO 4. siggraph's (instead of SIGGRAPH's) MySpell 3.2: serigraph's, photograph's, serigraphs, physiography's, physiography, digraphs, serigraph, stratigraphy's, stratigraphy epigraphs Hunspell 1.0.9: serigraph's, epigraph's, digraph's Hunspell 1.1.0: SIGGRAPH's --------------- end of examples -------------------- * improved testing environment with suggestion checking and memory debugging memory debugging of all tests with a simple command: VALGRIND=memcheck make check * lots of other improvements and bug fixes (see ChangeLog) 2005-09-19 NĂŠmeth LĂĄszlĂł <[EMAIL PROTECTED]>: * src/hunspell/suggestmgr.cxx: improved ngram suggestion: - detect not neighboring swap characters (pernament -> permanent) Rationale: ngram method has a significant error with not neighboring swap characters, especially when swap is in the middle of the word. - suggest uppercase forms (unesco -> UNESCO, siggraph's -> SIGGRAPH's) - suggest only ngram swap character and uppercase form, if they exist. Rationale: swap character and casing equivalence give mutch better suggestions as any other (weighted) ngram suggestions. - add uppercase suggestion (PERMENANT -> PERMANENT) * src/hunspell/*: complete comparison with MySpell 3.2 (in OOo beta 2): - affixmgr.cxx: add missing numrep initialization - hashmgr.cxx: add_word(): don't allocate temporary records - hunspell.cxx: in suggest(): - check capitalized words first (better sug. order for proper names), - check pSMgr->suggest() return value - set pSMgr->suggest() call to not optional in HUHCAP - csutil.cxx: fix bad KOI8-U -> koi8r_tbl reference in enc_entry encds - csutil.cxx: fix casing data in ISO 8859-2, Windows 1251 and KOI8-U encoding tables. Bug reported by Dmitri Gabinski. * src/hunspell/affixmgr.*: improved compound word and other features - generalize hu_HU specific compound word features with new affix file parameters, suggested by Bram Moolenaar: - CHECKCOMPOUNDDUP: forbid word duplication in compounds (eg. foo|foo) - CHECKCOMPOUNDTRIPLE: forbid triple letters in compounds (eg. foo|obar) - CHECKCOMPOUNDPATTERN: forbid patterns at word bounds in compounds - CHECKCOMPOUNDREP: using REP replacement table, forbid presumably bad compounds (useful for languages with unlimited number of compounds) - ONLYINCOMPOUND flag works also with words (see tests/onlyincompound.*) Suggested by Daniel Naber, BjĂśrn Jacke, TrĂłn Viktor & Bram Moolenaar. - PSEUDOROOT works also with prefixes and prefix + suffix combinations (see tests/pseudoroot5.*). Suggested by TrĂłn Viktor. - man/hunspell.4: updated man page * src/hunspell/affixmgr.*: fix incomplete prefix handling with twofold suffixes (delete unnecessary contclasses[] conditions in prefix_check_twosfx() and prefix_check_twosfx_morph()). Bug reported by TrĂłn Viktor. * src/hunspell/affixmgr.*: complete also *_morph() functions with conditions of new Hunspell features (circumfix, pseudoroot etc.). * src/hunspell/suggestmgr.cxx: - fix missing suggestions for words with crossed prefix and suffix - fix redundant non compound word checking - fix losing suggestions problem. Bug reported by Dmitri Gabinski. * src/hunspell/dictmgr.*: - add new dictionary manager for Hunspell UNO modul Problems with eo_ANY Esperanto locale reported by Dmitri Gabinski. * src/hunspell/*: use precise constant sizes for 8-bit and 16-bit character arrays with MAXWORDUTF8LEN and MAXSWUTF8L macros. * src/hunspell/affixmgr.cxx: fix bad MAXNGRAMSUGS parameter handling * src/hunspell/affixmgr.cxx, src/tools/{un}munch.*: fix GCC 4.0 warnings on fgets(), reported by Dvornik LĂĄszlĂł * po/hu.po: improved translation by Dvornik LĂĄszlĂł * tests/test.sh: improved test environment - add suggestion testing (see tests/*.sug) - add memory debugging environment, based on the excellent Valgrind debugger. Usage on Linux and experimental platforms of Valgrind: VALGRIND=memcheck make check - rename test_hunmorph to test.sh * tests/*: new tests: - base.*: base example based on MySpell's checkme.lst. - map{,utf}.*, rep{,utf}: MAP and REP suggestion examples - tests on new CHECKCOMPOUND, ONLYINCOMPOUND and PSEUDOROOT features - i54633.*: capitalized suggestion test for Issue 54633 from OOo's Issuezilla - i35725.*: improved ngram suggestion test for Issue 35725 ---------------------------------------------------------------- This message was sent using IMP, the Internet Messaging Program. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
