Re: Added Words Not In Dictionary, Redux

Stephan Witt Tue, 10 May 2016 08:23:34 -0700

Am 08.05.2016 um 21:32 schrieb Jean-Marc Lasgouttes <lasgout...@lyx.org>:
> 
> Le 08/05/16 à 19:17, Stephan Witt a écrit :
>> I’ve pasted your email into a new document and switched to Aspell as my
>> current spell checker backend in LyX. As you’ve described it the 
>> "communities’"
>> are marked as misspelled. Obviously the word „communities“ is a know one and
>> the problem is the handling of the possessive apostrophe for plural words
>> with terminal ‚s‘ - this is a problem in the Aspell dictionary or the Aspell
>> code. I don’t know if Aspell is able to handle this correctly. (BTW, hunspell
>> isn’t better on my system.)
> It might be interesting to adapt what emacs does here.


Isn’t this a workaround for a broken spell checker backend and/or dictionary?

If I try it with the hunspell command line on CentOS I get: 
======================
$ hunspell -d en_US
Hunspell 1.2.8
For example communities and communities'
*
*
+ community
*
+ community
======================

I interpret it that it works ok. Hunspell got the words „communities“ and 
„communities'" through affix removal.

I’ll have to check what LyX is doing here.

Stephan

> 
> JMarc
> 
> 
> ispell-dictionary-alist is a variable defined in `ispell.el'.
> Its value is shown below.
> 
> Documentation:
> An alist of dictionaries and their associated parameters.
> 
> Each element of this list is also a list:
> 
> (DICTIONARY-NAME CASECHARS NOT-CASECHARS OTHERCHARS MANY-OTHERCHARS-P
>        ISPELL-ARGS EXTENDED-CHARACTER-MODE CHARACTER-SET)
> 
> DICTIONARY-NAME is a possible string value of variable `ispell-dictionary',
> nil means the default dictionary.
> 
> CASECHARS is a regular expression of valid characters that comprise a word.
> 
> NOT-CASECHARS is the opposite regexp of CASECHARS.
> 
> OTHERCHARS is a regexp of characters in the NOT-CASECHARS set but which can be
> used to construct words in some special way.  If OTHERCHARS characters follow
> and precede characters from CASECHARS, they are parsed as part of a word,
> otherwise they become word-breaks.  As an example in English, assume the
> regular expression "[']" for OTHERCHARS.  Then "they're" and
> "Steven's" are parsed as single words including the "'" character, but
> "Stevens'" does not include the quote character as part of the word.
> If you want OTHERCHARS to be empty, use the empty string.
> Hint: regexp syntax requires the hyphen to be declared first here.
> 
> CASECHARS, NOT-CASECHARS, and OTHERCHARS must be unibyte strings
> containing bytes of CHARACTER-SET.  In addition, if they contain
> non-ASCII bytes, the regular expression must be a single
> `character set' construct that doesn't specify a character range
> for non-ASCII bytes.
> 
> MANY-OTHERCHARS-P is non-nil when multiple OTHERCHARS are allowed in a word.
> Otherwise only a single OTHERCHARS character is allowed to be part of any
> single word.
> 
> ISPELL-ARGS is a list of additional arguments passed to the ispell
> subprocess.
> 
> EXTENDED-CHARACTER-MODE should be used when dictionaries are used which
> have been configured in an Ispell affix file.  (For example, umlauts
> can be encoded as \"a, a\", "a, ...)  Defaults are ~tex and ~nroff
> in English.  This has the same effect as the command-line `-T' option.
> The buffer Major Mode controls Ispell's parsing in tex or nroff mode,
> but the dictionary can control the extended character mode.
> Both defaults can be overruled in a buffer-local fashion.  See
> `ispell-parsing-keyword' for details on this.
> 
> CHARACTER-SET used to encode text sent to the ispell subprocess
> when the language uses non-ASCII characters.
> 
> Note that with "ispell" as the speller, the CASECHARS and
> OTHERCHARS slots of the alist should contain the same character
> set as casechars and otherchars in the LANGUAGE.aff file (e.g.,
> english.aff).  aspell and hunspell don't have this limitation.
> 
> Value:
> (("fr" "[[:alpha:]]" "[^[:alpha:]]" "[-'.@]" t nil nil iso-8859-1)
> ("en" "[[:alpha:]]" "[^[:alpha:]]" "[']" nil nil nil iso-8859-1)
> ("en_AU" "[[:alpha:]]" "[^[:alpha:]]" "[']" nil nil nil iso-8859-1)
> ("en_GB" "[[:alpha:]]" "[^[:alpha:]]" "[']" nil nil nil iso-8859-1)
> ("en_CA" "[[:alpha:]]" "[^[:alpha:]]" "[']" nil nil nil iso-8859-1)
> ("de" "[[:alpha:]]" "[^[:alpha:]]" "[']" t nil nil iso-8859-1)
> ("es" "[[:alpha:]]" "[^[:alpha:]]" "[-]" nil nil nil iso-8859-1)
> ("it" "[[:alpha:]]" "[^[:alpha:]]" "[-.]" nil nil nil iso-8859-1)
> ("nl" "[[:alpha:]]" "[^[:alpha:]]" "[']" t nil nil iso-8859-1)
> ("sv" "[[:alpha:]]" "[^[:alpha:]]" "[']" nil nil nil iso-8859-1)
> ("da" "[[:alpha:]]" "[^[:alpha:]]" "[']" nil nil nil iso-8859-1)
> ("pt" "[[:alpha:]]" "[^[:alpha:]]" "[']" t nil nil iso-8859-1)
> ("pt_BR" "[[:alpha:]]" "[^[:alpha:]]" "[']" t nil nil iso-8859-1)
> ("ru" "[[:alpha:]]" "[^[:alpha:]]" "" nil nil nil iso-8859-1)
> ("pl_PL" "[[:alpha:]]" "[^[:alpha:]]" "[']" t nil nil iso-8859-1))
> 
>

Re: Added Words Not In Dictionary, Redux

Reply via email to