Re: problem seems to be solved: unreadable .hash files (dictionaries)

Dom Lachowicz Fri, 16 Mar 2001 14:18:04 -0800
Hi Vlad and Paul,

This is some good news on the ispell front. I had all-but-given-up on ispell 
working for us long-term. Now it seems that it might be at least feasible 
again.

I have a RFP that's really simple to implement for whomever wants it:

We need to abandon "american.hash" - we need something more robust. What I 
think we want is en_US.hash, de_DE.hash, etc... If we do this, we can 
dynamically load dictionaries based on our current locale or even with the 
"lang" attribute like my hack last night.

So I guess my suggested plan of action is this:
1) Rename the dictionaries (and start housing (*not necessarily shipping*) 
some known working ones on the website)
2) And Either:
a) Change ispell's SpellCheckInit() function to take a string of the form 
'en_US' and have *it* create the proper .hash name so we can share 100% code 
with Pspell
b) Keep passing the full path to the dictionary, and have that 1 ifdef in 
our code for ispell/pspell

Whaddya think?
Dom

>From: Paul Rohr <[EMAIL PROTECTED]>
>To: Vlad Harchev <[EMAIL PROTECTED]>, [EMAIL PROTECTED]
>Subject: Re: problem seems to be solved: unreadable .hash files  
>(dictionaries)
>Date: Fri, 16 Mar 2001 14:13:08 -0800
>
>Vlad,
>
>Thanks for the detective work!
>
>At 02:02 PM 3/16/01 +0400, Vlad Harchev wrote:
> > I remember a lot of people complained that AW can't use some hash files
>(i.e.
> >dictionaries for ispell) - that ispell module spits out some message 
>about
> >incorrect header..
> > While helping other people to select a russian dictionary, I discovered
>that
> >'file' utility knows ispell format (at least on my RH6.0) and that we can
> >judge whether the hash file will be loadable by ispell module or not
>basing on
> >the output of 'file' command. For example, here is an output for the
> >russian.hash file that can be used by AW's ispell:
> >
> >[hvv@h dictionary]$ file russian.hash
> >russian.hash: little endian ispell 3.1 hash file, 8-bit, capitalization, 
>26
> >flags and 100 string characters
> >[hvv@h dictionary]$
> >
> > It seems that hash files for which '7-bit' is mentioned in the output of
> >'file' command can't be used by AW.
>
>Bingo.  That's it.  If you grep the sources for NO8BIT, you'll see that one
>of the few things it affects is SET_SIZE, which in turn controls the size 
>of
>various ispell structs, inclung the main hashtable.
>
>   http://www.abisource.com/lxr/source/abi/src/other/spell/ispell.h#495
>
>The error message we usually get is a sanity check to make sure that
>ispell's not reading a hashtable of the wrong length.  For example, see:
>
>   http://bugzilla.abisource.com/show_bug.cgi?id=902
>   http://bugzilla.abisource.com/show_bug.cgi?id=824
>
>Note that the hashtable loader currently just reads the entire struct from
>disk to memory here:
>
>   http://www.abisource.com/lxr/source/abi/src/other/spell/lookup.c#159
>
>Gag.  Methinks it would be prudent to just rewrite the loader to do the 
>math
>to detect this situation and do the extra work needed to try and load 7-bit
>content into the 8-bit structs we currently use.
>
> >Also it turns out that (at least for
> >russian dictionary) it's possible to specify whether to use 7-bit or 
>8-bit
> >format of hash files by altering Makefile for dictionary (there are 
>makefile
> >variables that control that). So, it seems we have a hope of knowing te
>way of
> >building ispell dictionaries that will be understood by our ispell. At 
>least
> >we may try to build .hash files for languages for which only unreadable 
>by
>our
> >iconv compiled dictionaries are available..
>
>Exactly.  Until someone's willing to write the code mentioned above to also
>load 7-bit dictionaries, we now have a few simple workarounds:
>
>   - update the FAQ to tell folks not to use 7-bit dictionaries
>   - ideally, point them to 8-bit alternatives
>
>Any volunteers?  ;-)
>
>Paul
>

_________________________________________________________________
Get your FREE download of MSN Explorer at http://explorer.msn.com
Re: problem seems to be solved: unreadable .hash files (dictionaries)

Reply via email to