For the intranet I am working on, the data managers want to be able
to adjust the synonyms file, so I am writing them a little CGI app
to manage the synonyms list. However, that kind of requires that I
understand the finer points of the way htdig does synonyms in some
detail.

Note: using v3.1.5

Ok. Configuration File Docs says (under synonyms_dictionary ):

    "... a text file containing the synonym dictionary used for the
        synonyms search algorithm. Each line of this file has at least
        two words. The first word is the word to replace, the rest of
        the words are synonyms for that word. "


So, taking the example of

    car auto automobile

What that says to me is if I search for "car", it should also
search for "auto" or "automobile", but if I search for "auto",
it will only search for "auto". That is, from the description,
I would EXPECT to get:

     Search Term       ==>   Logical Words
      car                     car or auto or automobile
      auto                    auto
      automobile              automobile

However, what I actually get is:

     Search Term       ==>   Logical Words
      car                     car or auto or automobile
      auto                    auto or car or automobile
      automobile              automobile or car or auto

So my first question is:

   Does the order of the words in the synonyms file in a single
   entry make any difference at all?

This is relevant because if there is a "primary" term, I can
sensibly sort by and classify by that, but if all the terms have
equal weight, it makes little sense to try.


Ok. Second question is to do with this example - the following
values appear in my synonyms file exactly as shown:

   ommision omission
   ommission omission
   ommission omicron

What I get when I use each of these terms:

     Search Term       ==>   Logical Words
      ommision                ommision or omission
      ommission               ommission or omicron
      omission                omission or ommission
      omicron                 omicron or ommission

I also got exactly the same results for:

   ommision omission
   ommission omission
   omicron ommission

The pattern here seems to be: if a term is used more than once, then
the synonyms that are used for that term are the ones that appear in
the last group it appears in. The fact I got the same results both
times is an extra indicator that the order within a group is irrelevant.

I have all sorts of questions about this: Is it a bug? Is it a feature?

However my main question I have with regards to the application I am
writing is - should it automatcially "resolve" groups if it detects an
overlap?

For example, say my synonym file contained

   balloon dirigible
   chatterbox gasbag

And, then someone wanted to add "gasbag" as a synonym of "balloon", should
it automatically resolve it to:

   balloon dirigible gasbag chatterbox


Any thoughts on this would be appreciated.

Regs

Brian

-------------------------
Brian White
Step Two Designs Pty Ltd - SGML, XML & HTML Consultancy
Phone: +612-93197901
Web:   http://www.steptwo.com.au/
Email: [EMAIL PROTECTED]


_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to