Sounds like the sort of filter that could be usefully cached. You can do all this in Java code or the XML query parser (in contrib) might be a quick and simple way to externalize the profanity settings in a stylesheet which is actually used at query time e.g.
<?xml version="1.0" encoding="ISO-8859-1"?> <xsl:template match="/Document"> <FilteredQuery> <Query> <<UserQuery><xsl:value-of select="content"/></UserQuery> </Query> <Filter> <CachedFilter> <TermsFilter fieldName="content"> naughty1 naughty2 xxx </TermsFilter> </CachedFilter> </Filter> </FilteredQuery> </xsl:template> </xsl:stylesheet> The above example also automatically adds caching to the results of the profanity filter. Your app code to use this would then look like this: init() //parse and cache the stylesheet QueryTemplateManager qtm=new QueryTemplateManager(getClass().getResourceAsStream("query.xsl")); .... runQuery() //get the user input Properties userInput=new Properties(); userInput.setProperty("content",httpRequest.getParameter("queryCriteria"); //Transform the user input into a Lucene XML query org.w3c.dom.Document doc=qtm.getQueryAsDOM(userInput); //Parse the XML query using the XML parser Query q=xmlQueryBuilder.getQuery(doc.getDocumentElement()); //run query as normal Cheers Mark ----- Original Message ---- From: Greg Gershman <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Wednesday, 7 March, 2007 3:07:45 PM Subject: Negative Filtering (such as for profanity) I'm attempting to create a profanity filter. I thought to use a QueryFilter created with a Query of (-$#!+ AND [EMAIL PROTECTED] AND etc). The problem I have run into is that, as a pure negative query is not supported (a query for (-term) DOES NOT return the inverse of a query for (term)), I believe the bit set returned by a purely negative QueryFilter is empty, so no matter how many results returned by the initial query, the result after filtering is always zero documents. I was wondering if anyone had suggestions as to how else to do this. I've considered simply amending the query string submitted by the user to include a pre-generated String that would exclude the query terms, but I consider this a non-elegant solution. I had also thought about creating a new sub-class of QueryFilter, NegativeQueryFilter. Basically, it would works just like a QueryFilter, taking a positive query (so, I would pass it an OR'ed list of profane words), then the resulting bits are simply flipped. I think this would work, unless I'm missing something. I'm going to experiment with it, I'd appreciate anyone's thoughts on this. Thanks, Greg ____________________________________________________________________________________ It's here! Your new message! Get new email alerts with the free Yahoo! Toolbar. http://tools.search.yahoo.com/toolbar/features/mail/ ___________________________________________________________ New Yahoo! Mail is the ultimate force in competitive emailing. Find out more at the Yahoo! Mail Championships. Plus: play games and win prizes. http://uk.rd.yahoo.com/evt=44106/*http://mail.yahoo.net/uk --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]