Right now, the existing database structure seems to meet all my needs.
But with under 500 documents to index, we don't have a demanding site.
Phrase searching would be nice, I suppose.
I'll also second Andrew's last two points, not so much for myself as for
some other users. It seems that those two points are related as well.
I recall someone who had wanted the new backlink_factor support was unhappy
because a non-zero backlink_factor REALLY slowed down the search when there
were a lot of hits. I think any database redesign should take this into
consideration. Big fields like DocHead should not need to be fetched when
all you want are small fields to help with scoring and sorting, such as the
backlink count, time, and perhaps the title. These should be in separate
records, or even separate files.
According to Geoff Hutchison:
> I expected this would be a subject that would bring out input from a lot of
> people. Right now this list numbers 65 people, and only a few messages have
> flashed back and forth.
>
> I'll sum up the requirements we have so far. Are there any others? Are
> there some on the list that we *don't* really need? Is it safe for me to
> assume this is the requirements we want in our specification?
>
> Andrew's list:
> * phrase searching
> * fuzzy searching (basically as it is now)
> * use of "+" or "-" as prefix to search words (ala altavista)
> * use of "near" as a method to determine relations between search words
> * cross platform (unix, nt)
> * ability to search only in specific areas of documents (title, headers, etc)
> * better relevance ranking
> * faster results generation for searches returning many hits
>
> Mine:
> * Collections of databases
> * Parallel indexing and searching (no need for alternate files or htmerge)
> * Multithreading support (some sort of locking for writes)
> * Removing duplicate documents
> * Referer links (e.g. AltaVista-style link:)
> * Search for "more like" or "similar to" (a la Excite)
> * On-the-fly editing of search factors (without needing to rebuild the db)
>
> (I also forgot to include)
> * Flexible backend (use Berkeley DB, *SQL, Oracle, etc.)
> * Internationalization (e.g. Chinese support, probably through Unicode)
--
Gilles R. Detillieux E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba Phone: (204)789-3766
Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930
------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the SUBJECT of the message.