On Thu, 12 Jul 2001, Gilles Detillieux wrote:
> According to Elizabeth Taylor:
> > I am trying to find out if Ht://Dig uses reverse indexing and boolean
> > logic to obtain search results or if it also uses vector space
> > retrieval?
Hrm. I guess the reply I wrote didn't manage to get to the mailing
list? I love e-mail. <rolls eyes>
No one really does vector space indexing because it's just too
inefficient, either in terms of space required for the index or for
indexing (or both). For those of you curious, vector-space indexing
basically means you'd have a vector of the words (or word ids) in a given
document. So you can take the "distance" between a given query and each
document.
> index that tells htsearch essentially which document contain a given word.
> I believe this is known as a reverse index, although the term doesn't
It's also called an "inverted index," for the reason that you've turned
the text on its head--from pages of words to words pointing to
pages/documents.
In any case, almost every search engine that I know of uses an inverted
index for the word database and then constructs some form of boolean
query as Gilles mentioned. However, it's not strictly a boolean query in
the traditional information retrieval sense, because search engines do
rankings once they limit the results of the query, while a search at your
local library probably just gives you all the matches in, say,
alphabetical or chronological order.
--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html