-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


William Morgan wrote:
> Excerpts from David Balmain's message of Fri Apr 06 00:45:42 -0700 2007:
>> So what do people think? Should stop-words be filtered by default?
> 
> I also vote to turn them off by default. Their usefuless to retrieval
> performance is limited to specific and uncommon situations, whereas
> their ability to confuse people is not.
> 

Do you have any proof for this assumption? Every fulltext search I use
has a stopword-list by default. Mysql FULLTEXT for example even needs to
be recompiled if you want to change them. I also want to argue that the
use of stopwords is very common. For example, if I have an index of
1.000 english documents and search for 'and', chances are high that I
get a result set of 1000 hits - which is unusable. I am unable to see
the corner-case in this scenario. We are not talking about performance
here - we are talking about sane results. Stopwords are more of a result
 than an performance optimization.
If you want to query phrases, i would be wise to use ferrets
phrase-query instead of killing the stopwords.

I cannot find it at the moment, but there was the point that 'premature'
optimization is bad. This may be wise for your own application, but the
libraries in use should be a) mature and b) optimized.

Greetings
Florian
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGGOjt8RlGMqQ8m7oRArS1AJ0bz7nvEniqilGUFmY+IFQEzzHMpQCfVBpT
VzDUFW9MVtbQwVOkF/UiRoA=
=WzGq
-----END PGP SIGNATURE-----
_______________________________________________
Ferret-talk mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/ferret-talk

Reply via email to