On Wednesday 10 June 2009 06:54:03 Daniel Cheng wrote: > On Wed, Jun 10, 2009 at 12:02 PM, Evan Daniel<[email protected]> wrote: > > On my (incomplete) spider index, the index file for the word "the" (it > > indexes no other words) is 17MB. This seems rather large. It might > > make sense to have the spider not even bother creating an index on a > > handful of very common words (the, be, to, of, and, a, in, I, etc). > > Of course, this presents the occasional difficulty: > > http://bash.org/?514353 I think I'm in favor of not indexing common > > words even so. > > Yes, it should ignore common words. > This is called "stopword" in search engine termology.
How do you propose to implement a search for "doctor who" if "who" is a stopword?
signature.asc
Description: This is a digitally signed message part.
_______________________________________________ Devl mailing list [email protected] http://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl
