Hi,

I found a flaw in the logic of the using the synonyms algorithm in HTDig.
The current algorithm searches only the words from the synonyms database,
and cannot find the related word forms.

Simple example:  words "center" and "centre" are different forms for the
same word in GB and US, so they are in the synonyms database.  The word forms
for these words are created according the following flags: center/DGJMRSZ,
centre/DGMS.  So, the forms created with /G flag should be: "centering" and
"centring".  An attempt to find these words in my document database
results in 13 documents for "centering" and 0 documents for "centring" while
it gives the same number of words for "center" and "centre" (63 documents).

The flaw is in that the word forms are searched in all databases
simultaneously (i.e. in endings and synonyms databases), so the synonym list
is known after all word endings have been found.  The correct solution
would be the following:
  1. Look into word2root database to find the root(s) of the word(s)
     (centring->centre);
  2. Look into synonyms database to find possible synonyms
     (centre = center);
  3. Find all word forms for the root(s) and _all_synonyms_ using root2word
     database (..., centring, centering, ...).

Can I ask to take into account these corrections in HTDig code?

Thank you for your help,
- Alexander


_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to