Gilles Detillieux <[EMAIL PROTECTED]> wrote:

> According to Alexander I. Lebedev:
> > Thank you for the patch.  I tested the updated version of 3.1.5 and
> > it works great! (The problems with endings in Russian disappeared.)
> > I'll test the program thoroughly tomorrow.
> > 
> > The only thing that surprised me was that words "highness", "likeness",
> > and "witness" were already in the English word list, so there was no need
> > in high/P, like/P, and wit/P forms at all.  Has anybody checked the
> > redundancy of the list?
>
>No, not that I know of.  We pretty much just used it as-is from the
>ispell dictionary distribution, I think.  In fact, the english.0 and
>english.aff files haven't changed since 3.0.8b2 back in 1997, so any
>checking done on them (beyond what the ispell folks have done) would
>likely have been by Andrew, if at all.  I think that as far as ispell
>is concerned, redundancy doesn't matter.  It's only because htfuzzy uses
>these files for a more specialized purpose that we need more accuracy.
>
>I don't doubt that if we looked hard enough, or wrote a program to look
>for such redundancies, we'd find several more.  I guess the question is
>whether it's worth the effort.  In 4 years we've had one complaint about
>inaccurate word roots, which my patch corrects.  If any others come up,
>we can correct them as we go, now that we know better how to deal with
>them.

Gilles,

I checked the original and extended English word lists in HTDig and found
in them, resp., 2756 and 3787 words that have more than one root (the
numbers may be even more as I didn't transform uppercase letters to
lowercase ones).

Moreover, I discovered IMHO odd behaviour of I and U flags that produce
forms like:
        wanted -- unwanted,
        expensive -- inexpensive.
I guess many users would like to exclude such forms that produce
doubtful results.  I think it can be easily done using simple shell
script that excludes all the lines, in which /I and /U are only flags,
and remove these flags in lines, which have other flags (it can be
easily done using sed).  If you find my idea good, I'll send you the
script.  If anyone wants, I could send the lists of duplicate word forms
for analyzing.

- Alexander


_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to