Spell checker news

I have made a full comparison with MySpell 3.2,
thanks for Dmitri Gabinski's bugreport. Thanks for Dmitri and other
contributors. I will finish Hunspell integration for QA today.
There are also OOo 1.1.4 and OOo 2.0 beta 2 UNO modules here for
testing.

Laci

Hunspell 1.1.0 on Sourceforge:

https://sourceforge.net/project/showfiles.php?group_id=143754&package_id=157912

NEWS

2005-09-19: Hunspell 1.1.0 release

* complete comparison with MySpell 3.2 (from OpenOffice.org 2 beta)

* improved ngram suggestion with swap character detection and
  case insensitivity

------ examples for ngram improvement (input word and suggestions) -----

1. pernament (instead of permanent)

MySpell 3.2: tournaments, tournament, ornaments, ornament's, ornamenting,
ornamented,
        ornament, ornamentals, ornamental, ornamentally

Hunspell 1.0.9: ornamental, ornament, tournament

Hunspell 1.1.0: permanent

Note: swap character detection


2. PERNAMENT (instead of PERMANENT)

MySpell 3.2: -

Hunspell 1.0.9: -

Hunspell 1.1.0: PERMANENT


3. Unesco (instead of UNESCO)

MySpell 3.2: Genesco, Ionesco, Genesco's, Ionesco's, Frescoing, Fresco's,
             Frescoed, Fresco, Escorts, Escorting

Hunspell 1.0.9: Genesco, Ionesco, Fresco

Hunspell 1.1.0: UNESCO


4. siggraph's (instead of SIGGRAPH's)

MySpell 3.2: serigraph's, photograph's, serigraphs, physiography's,
             physiography, digraphs, serigraph, stratigraphy's, stratigraphy
             epigraphs

Hunspell 1.0.9: serigraph's, epigraph's, digraph's

Hunspell 1.1.0: SIGGRAPH's

--------------- end of examples --------------------

* improved testing environment with suggestion checking and memory debugging

  memory debugging of all tests with a simple command:

  VALGRIND=memcheck make check

* lots of other improvements and bug fixes (see ChangeLog)


2005-09-19 NĂŠmeth LĂĄszlĂł <[EMAIL PROTECTED]>:
        * src/hunspell/suggestmgr.cxx: improved ngram suggestion:
        - detect not neighboring swap characters (pernament -> permanent)
          Rationale: ngram method has a significant error with not neighboring
          swap characters, especially when swap is in the middle of the word.
        - suggest uppercase forms (unesco -> UNESCO, siggraph's -> SIGGRAPH's)
        - suggest only ngram swap character and uppercase form, if they exist.
          Rationale: swap character and casing equivalence give mutch better
          suggestions as any other (weighted) ngram suggestions.
        - add uppercase suggestion (PERMENANT -> PERMANENT)

        * src/hunspell/*: complete comparison with MySpell 3.2 (in OOo beta 2):
        - affixmgr.cxx: add missing numrep initialization
        - hashmgr.cxx: add_word(): don't allocate temporary records
        - hunspell.cxx: in suggest():
          - check capitalized words first (better sug. order for proper names),
          - check pSMgr->suggest() return value
          - set pSMgr->suggest() call to not optional in HUHCAP
        - csutil.cxx: fix bad KOI8-U -> koi8r_tbl reference in enc_entry encds
        - csutil.cxx: fix casing data in ISO 8859-2, Windows 1251 and KOI8-U
          encoding tables. Bug reported by Dmitri Gabinski.

        * src/hunspell/affixmgr.*: improved compound word and other features
        - generalize hu_HU specific compound word features with new affix file
          parameters, suggested by Bram Moolenaar:
        - CHECKCOMPOUNDDUP: forbid word duplication in compounds (eg. foo|foo)
        - CHECKCOMPOUNDTRIPLE: forbid triple letters in compounds (eg. foo|obar)
        - CHECKCOMPOUNDPATTERN: forbid patterns at word bounds in compounds
        - CHECKCOMPOUNDREP: using REP replacement table, forbid presumably bad
          compounds (useful for languages with unlimited number of compounds)
        - ONLYINCOMPOUND flag works also with words (see tests/onlyincompound.*)
          Suggested by Daniel Naber, BjĂśrn Jacke, TrĂłn Viktor & Bram
Moolenaar.
        - PSEUDOROOT works also with prefixes and prefix + suffix combinations
          (see tests/pseudoroot5.*). Suggested by TrĂłn Viktor.
        - man/hunspell.4: updated man page

        * src/hunspell/affixmgr.*: fix incomplete prefix handling with twofold
          suffixes (delete unnecessary contclasses[] conditions in
          prefix_check_twosfx() and prefix_check_twosfx_morph()).
          Bug reported by TrĂłn Viktor.

        * src/hunspell/affixmgr.*: complete also *_morph() functions with
          conditions of new Hunspell features (circumfix, pseudoroot etc.).

        * src/hunspell/suggestmgr.cxx:
        - fix missing suggestions for words with crossed prefix and suffix
        - fix redundant non compound word checking
        - fix losing suggestions problem. Bug reported by Dmitri Gabinski.

        * src/hunspell/dictmgr.*:
        - add new dictionary manager for Hunspell UNO modul
          Problems with eo_ANY Esperanto locale reported by Dmitri Gabinski.

        * src/hunspell/*: use precise constant sizes for 8-bit and 16-bit 
character
          arrays with MAXWORDUTF8LEN and MAXSWUTF8L macros.

        * src/hunspell/affixmgr.cxx: fix bad MAXNGRAMSUGS parameter handling

        * src/hunspell/affixmgr.cxx, src/tools/{un}munch.*: fix GCC 4.0 warnings
          on fgets(), reported by Dvornik LĂĄszlĂł

        * po/hu.po: improved translation by Dvornik LĂĄszlĂł

        * tests/test.sh: improved test environment
        - add suggestion testing (see tests/*.sug)
        - add memory debugging environment, based on the excellent Valgrind
debugger.
          Usage on Linux and experimental platforms of Valgrind:
          VALGRIND=memcheck make check
        - rename test_hunmorph to test.sh

        * tests/*: new tests:
        - base.*: base example based on MySpell's checkme.lst.
        - map{,utf}.*, rep{,utf}: MAP and REP suggestion examples
        - tests on new CHECKCOMPOUND, ONLYINCOMPOUND and PSEUDOROOT features
        - i54633.*: capitalized suggestion test for Issue 54633 from OOo's
Issuezilla
        - i35725.*: improved ngram suggestion test for Issue 35725






----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to