[Answering my own question]

I think a reasonable solution is to have a generic analyzer for use at query-time that can wrap my application's choice of analyzer and automatically filter out what it sees as stop words. It would initialize itself from an IndexReader and create a StopFilter for those terms greater than a given document frequency.

This approach seems reasonable because:
a) The stop word filter is automatically adaptive and doesn't need manual tuning. b) I can live with the disk space overhead of the few "killer" terms which will make it into the index. c) "Silent" failure (ie removal of terms from query) is probably generally preferable to the throw-an-exception approach taken by BooleanQuery if clause limits are exceeded.







                
___________________________________________________________ To help you stay safe and secure online, we've developed the all new Yahoo! Security Centre. http://uk.security.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to