According to Brian White:
> The pattern here seems to be: if a term is used more than once, then
> the synonyms that are used for that term are the ones that appear in
> the last group it appears in. The fact I got the same results both
> times is an extra indicator that the order within a group is irrelevant.
> 
> I have all sorts of questions about this: Is it a bug? Is it a feature?

Or is it just a simple-minded algorithm?

Your interpretation of your test results is pretty much bang-on.
Looking at the code confirms your suspicions.  If you have a look at
Synonym::createDB() in htfuzzy/Synonym.cc, you can see what it's doing.
For each line in the synonyms file, it makes a list of all the words.
Then, for each word in that list, it makes a database entry keyed to
that word, with all the other words in the list as the data record for
that key.  The problem is that if a word appears again another time,
the previous record is replaced with the new one, so the previous
equivalences are lost.

> However my main question I have with regards to the application I am
> writing is - should it automatcially "resolve" groups if it detects an
> overlap?
> 
> For example, say my synonym file contained
> 
>    balloon dirigible
>    chatterbox gasbag
> 
> And, then someone wanted to add "gasbag" as a synonym of "balloon", should
> it automatically resolve it to:
> 
>    balloon dirigible gasbag chatterbox
> 
> 
> Any thoughts on this would be appreciated.

Yes, given the limitations of the Synonym::createDB() algorithm, you'd
really have to resolve overlaps ahead of time before feeding the synonyms
file to htfuzzy.  Either that or you'd need to change the createDB code
to look for and add to existing records when a word appears more than
once.

-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to