To comment on the following update, log in, then open the issue:
http://www.openoffice.org/issues/show_bug.cgi?id=61459
Issue #:|61459
Summary:|Hunspell 1.1.3 with some new fixes
Component:|lingucomponent
Version:|OOo 2.0.1
Platform:|All
URL:|
OS/Version:|All
Status:|NEW
Status whiteboard:|
Keywords:|
Resolution:|
Issue type:|PATCH
Priority:|P3
Subcomponent:|spell checking
Assigned to:|nemeth
Reported by:|nemeth
------- Additional comments from [EMAIL PROTECTED] Tue Jan 31 16:47:27 -0800
2006 -------
In the CWS "hunspell01".
Improvements: tokenisation and COMPOUNDRULE fixes, improved suggestions
and German ß handling. Optional alias compression in data files (useful for
Arabic dictionary and other affix rich languages).
Changelog:
2006-01-30 Németh László <[EMAIL PROTECTED]>:
* src/parsers/textparser.cxx: fix Unicode tokenization in is_wordchar()
(extra word characters (WORDCHARS) didn't work on big-endian
platforms).
* src/hunspell/{csutil,affixmgr}.cxx: inline isSubset(), isRevSubset():
little speed optimalisation for languages with rich morphology.
* src/tools/hunspell.cxx: fix bad --with-ui and --with-readline
compiling
when (N)curses is missing. Reported by Daniel Naber.
2006-01-19 Tor Lillqvist <[EMAIL PROTECTED]>
* src/hunspell/csutil.cxx: mystrsep(): fix locale-dependent isspace()
tokenisation
2006-01-06 András Tímár <[EMAIL PROTECTED]>
* src/hunspell/{hashmgr.hxx,hunspell.cxx}: fix Visual C++ compiling
errors
2006-01-05 Németh László <[EMAIL PROTECTED]>:
* COPYING: set GPL/LGPL/MPL tri-license for Mozilla integration.
Rationale: Mozilla source code contains an old MySpell version
with GPL/LGPL/MPL tri-license. (MPL license is a copyleft license,
similar
to the LGPL, but it acts on file level.)
* COPYING.LGPL: GNU Lesser General Public License 2.1 (LGPL)
* COPYING.MPL: Mozilla Public License 1.1 (MPL)
* license.hunspell, src/hunspell/license.hunspell: GPL/LGPL/MPL
tri-license
* src/hunspell/{affixmgr,hashmgr}.*: AF, AM alias definitions in affix
file:
compression of flag sets and morphological descriptions (see manual,
and tests/alias* test files).
Rationale: Alias compression is also good for loading time and memory
efficiency, not only smaller resources.
* src/tools/makealias: alias compression utility
(usage: ./makealias file.dic file.aff)
* tests/alias{,2,3}: AF, AM tests
* man/hunspell.4: add AF, AM documentation
* src/hunspell/affentry.cxx, atypes.hxx: add new opts bits (aeALIASM,
aeALIASF)
* tools/hunspell, src/parser/*, src/hunspell/*: Hunspell program
tokenizes Unicode texts (only with UTF-8 encoded dictionaries).
Missing Unicode tokenization reported by Björn Jacke, Egmont
Koblinger,
Jess Body and others.
Note: Curses interactive interface hasn't worked perfectly yet.
* tests/*.tests: remove -1 parameters of Hunspell
* tests/*.{good,wrong}: remove tabulators
* src/hunspell/{hunspell,affixmgr}.cxx: BREAK option: break words at
specified break points and checking word parts separately (see
manual).
Note: COMPOUNDRULE is better (or will be better) for handling dashes
and
other compound joining characters or character strings. Use BREAK, if
you
want check words with dashes or other joining characters and there is
no time
or possibility to describe precise compound rules with COMPOUNDRULE.
* tests/break.*: BREAK example.
* src/hunspell/{affixmgr,hunspell}.cxx: add CHECKSHARPS declaration
instead
of LANG de_DE definitions to handle German sharp s in both spelling
and
suggestion.
* src/hunspell/hunspell.cxx: With CHECKSHARPS, uppercase words are valid
with both lower sharp s (it's is optional for names in German legal
texts)
and SS (MÜßIG, MÜSSIG). Missing lower sharp s form reported by Björn
Jacke.
* src/hunspell/hunspell.cxx: KEEPCASE flag on a sharp s word has a
special
meaning with CHECKSHARPS declaration: KEEPCASE permits capitalisation
and SS upper
casing of a sharp s word (Müßig and MÜSSIG), but forbids the upper
cased form
with lower sharp s character(s): *MÜßIG.
* tests/germancompounding*: add CHECKSHARPS, remove LANG
* tests/checksharps*: add CHECKSHARPS and KEEPCASE, remove LANG
* src/hunspell/hunspell.cxx: improved suggestions:
- suggestions for pressed Caps Lock problems: macARONI -> macaroni
- suggestions for long shift problems: MAcaroni -> Macaroni, macaroni
- suggestions for KEEPCASE words: KG -> kg
* src/hunspell/csutil.cxx: fix mystrrep() function:
- suggestions for lower sharp s in uppercased words: MÜßIG -> MÜSSIG
* tests/checksharps{,utf}.sug: add tests for mystrrep() fix
* src/hunspell/hashmgr.cxx: Now dictionary words can contain slashes
with the "\/" syntax. Problem reported by Frederik Fouvry.
* src/hunspell/hunspell.cxx: fix bad duplicate filter in suggest().
(Suggesting some capitalised compound words caused program crash
with Hungarian dictionary, OOo Issue 59055).
* src/hunspell/affixmgr.cxx: fix bad defcpd_check() call in
compound_check().
(Overlapping new COMPOUNDRULE and old compounding methods caused
program
crash at suggestion.)
* src/hunspell/affixmgr.{cxx,hxx}: check affix flag duplication at affix
classes.
Suggested by Daniel Naber.
* src/hunspell/affentry.cxx: remove unused variable declarations (OOo
i58338).
Compiler warnings reported by András Tímár and Martin Hollmichel.
* src/hunspell/hunspell.cxx: morph(): not analyse bad mixed uppercased
forms
(fix Arabic morphological analysis with Buckwalter's Arabic
transliteration)
* src/hunspell/affentry.{cxx,hxx}, atypes.hxx: little memory
optimization
in affentry:
- using unsigned char fields instead of short (stripl, appndl,
numconds)
- rename xpflg field to opts
- removing utf8 field, use aeUTF8 bit of opts field
* configure.ac: set tests/maputf.test to XFAILED on ARM platform.
Fail reported by Rene Engelhard.
* configure.ac: link Ncursesw library, if exists.
* BUGS: add BUGS file
* tests/complexprefixes2.*: test for morphological analysis with
COMPLEXPREFIXES
* src/hunspell/affixmgr.cxx: use "COMPOUNDRULE" instead of
"COMPOUND". The new name suggested by Bram Moolenaar.
* tests/compoundrule*: modified and renamed compound.* test files
* man/hunspell.4: AF, AM, BREAK, CHECKSHARPS, COMPOUNDRULE, KEEPCASE.
- also new addition to the documentation:
Header of the dictionary file define approximate dictionary size:
``A dictionary file (*.dic) contains a list of words, one per line.
The first line of the dictionaries (except personal dictionaries)
contains the _approximate_ word count (for optimal hash memory
size).''
Asked by Frederik Foudry.
One-character replacements in REP definitions: ``It's very useful to
define replacements for the most typical one-character mistakes, too:
with REP you can add higher priority to a subset of the TRY
suggestions
(suggestion list begins with the REP suggestions).''
---------------------------------------------------------------------
Please do not reply to this automatically generated notification from
Issue Tracker. Please log onto the website and enter your comments.
http://qa.openoffice.org/issue_handling/project_issues.html#notification
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]