For the intranet I am working on, the data managers want to be able
to adjust the synonyms file, so I am writing them a little CGI app
to manage the synonyms list. However, that kind of requires that I
understand the finer points of the way htdig does synonyms in some
detail.
Note: using v3.1.5
Ok. Configuration File Docs says (under synonyms_dictionary ):
"... a text file containing the synonym dictionary used for the
synonyms search algorithm. Each line of this file has at least
two words. The first word is the word to replace, the rest of
the words are synonyms for that word. "
So, taking the example of
car auto automobile
What that says to me is if I search for "car", it should also
search for "auto" or "automobile", but if I search for "auto",
it will only search for "auto". That is, from the description,
I would EXPECT to get:
Search Term ==> Logical Words
car car or auto or automobile
auto auto
automobile automobile
However, what I actually get is:
Search Term ==> Logical Words
car car or auto or automobile
auto auto or car or automobile
automobile automobile or car or auto
So my first question is:
Does the order of the words in the synonyms file in a single
entry make any difference at all?
This is relevant because if there is a "primary" term, I can
sensibly sort by and classify by that, but if all the terms have
equal weight, it makes little sense to try.
Ok. Second question is to do with this example - the following
values appear in my synonyms file exactly as shown:
ommision omission
ommission omission
ommission omicron
What I get when I use each of these terms:
Search Term ==> Logical Words
ommision ommision or omission
ommission ommission or omicron
omission omission or ommission
omicron omicron or ommission
I also got exactly the same results for:
ommision omission
ommission omission
omicron ommission
The pattern here seems to be: if a term is used more than once, then
the synonyms that are used for that term are the ones that appear in
the last group it appears in. The fact I got the same results both
times is an extra indicator that the order within a group is irrelevant.
I have all sorts of questions about this: Is it a bug? Is it a feature?
However my main question I have with regards to the application I am
writing is - should it automatcially "resolve" groups if it detects an
overlap?
For example, say my synonym file contained
balloon dirigible
chatterbox gasbag
And, then someone wanted to add "gasbag" as a synonym of "balloon", should
it automatically resolve it to:
balloon dirigible gasbag chatterbox
Any thoughts on this would be appreciated.
Regs
Brian
-------------------------
Brian White
Step Two Designs Pty Ltd - SGML, XML & HTML Consultancy
Phone: +612-93197901
Web: http://www.steptwo.com.au/
Email: [EMAIL PROTECTED]
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html