A few things to take into consideration.

1) To Minimize the Space Used The Word List Should be Compressed with
"prezip -s".  (The "-s" is to resort the word list using the "C" local
which is needed for maximum compressed with prezip). And than further
compressed with bzip2.  You can decompress it by piping it through "bzcat
| precat".  To give you an idea on sizes using various methods here are
the file sizes for en-common.wl (.cwl is the word list compressed with
prezip)

1224 en-common.wl
 424 en-common.cwl
 136 en-common.cwl.bz2
 164 en-common.cwl.gz
 432 en-common.wl.bz2
 332 en-common.wl.gz

yes bzip2 is WORSE than gzip on a sorted word list.

Also prezip and friends consists of an ANSI C program and some shell 
scripts which can easily be separated out into a separate package so that 
you can also use them with Ispell if so desired.

2) To avoid spitting out a bunch of warnings during compile you should
clean it by piping it though "aspell clean strict".  This will remove all
problem words and affix flags that Aspell will complain about when
compiling.  The compiled dictionary should be the same with either the
dirty or the clean word list.  You can also use "aspell clean" but that
but that handles some errors in a different way and the resulting compiled
dictionary may be different.

3) Aspell by defaults performs a number of checks when creating a 
word list, some if these can be expensive.  You can disable the expensive 
one with "--dont-validate-affixes".  If you clean the word list first this 
should be 100% safe.  It should also be safe to use on a dirty word list 
as the invalid affix flags don't cause a problem in the compiled word 
list.  You may also consider using "--validate-words" but those checks are 
not very expensive.

--
Kevin Atkinson
Aspell Author
http://kevin.atkinson.dhs.org



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]

Reply via email to