-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
William Morgan wrote: > Excerpts from David Balmain's message of Fri Apr 06 00:45:42 -0700 2007: >> So what do people think? Should stop-words be filtered by default? > > I also vote to turn them off by default. Their usefuless to retrieval > performance is limited to specific and uncommon situations, whereas > their ability to confuse people is not. > Do you have any proof for this assumption? Every fulltext search I use has a stopword-list by default. Mysql FULLTEXT for example even needs to be recompiled if you want to change them. I also want to argue that the use of stopwords is very common. For example, if I have an index of 1.000 english documents and search for 'and', chances are high that I get a result set of 1000 hits - which is unusable. I am unable to see the corner-case in this scenario. We are not talking about performance here - we are talking about sane results. Stopwords are more of a result than an performance optimization. If you want to query phrases, i would be wise to use ferrets phrase-query instead of killing the stopwords. I cannot find it at the moment, but there was the point that 'premature' optimization is bad. This may be wise for your own application, but the libraries in use should be a) mature and b) optimized. Greetings Florian -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.3 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGGOjt8RlGMqQ8m7oRArS1AJ0bz7nvEniqilGUFmY+IFQEzzHMpQCfVBpT VzDUFW9MVtbQwVOkF/UiRoA= =WzGq -----END PGP SIGNATURE----- _______________________________________________ Ferret-talk mailing list [email protected] http://rubyforge.org/mailman/listinfo/ferret-talk

