Sounds like the sort of filter that could be usefully cached.
You can do all this in Java code or the XML query parser (in contrib) might be
a quick and simple way to externalize the profanity settings in a stylesheet
which is actually used at query time e.g.
<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:template match="/Document">
<FilteredQuery>
<Query>
<<UserQuery><xsl:value-of select="content"/></UserQuery>
</Query>
<Filter>
<CachedFilter>
<TermsFilter fieldName="content">
naughty1 naughty2 xxx
</TermsFilter>
</CachedFilter>
</Filter>
</FilteredQuery>
</xsl:template>
</xsl:stylesheet>
The above example also automatically adds caching to the results of the
profanity filter.
Your app code to use this would then look like this:
init()
//parse and cache the stylesheet
QueryTemplateManager qtm=new
QueryTemplateManager(getClass().getResourceAsStream("query.xsl"));
....
runQuery()
//get the user input
Properties userInput=new Properties();
userInput.setProperty("content",httpRequest.getParameter("queryCriteria");
//Transform the user input into a Lucene XML query
org.w3c.dom.Document doc=qtm.getQueryAsDOM(userInput);
//Parse the XML query using the XML parser
Query q=xmlQueryBuilder.getQuery(doc.getDocumentElement());
//run query as normal
Cheers
Mark
----- Original Message ----
From: Greg Gershman <[EMAIL PROTECTED]>
To: [email protected]
Sent: Wednesday, 7 March, 2007 3:07:45 PM
Subject: Negative Filtering (such as for profanity)
I'm attempting to create a profanity filter. I thought to use a QueryFilter
created with a Query of (-$#!+ AND [EMAIL PROTECTED] AND etc). The problem I
have run into is that, as a pure negative query is not supported (a query for
(-term) DOES NOT return the inverse of a query for (term)), I believe the bit
set returned by a purely negative QueryFilter is empty, so no matter how many
results returned by the initial query, the result after filtering is always
zero documents.
I was wondering if anyone had suggestions as to how else to do this. I've
considered simply amending the query string submitted by the user to include a
pre-generated String that would exclude the query terms, but I consider this a
non-elegant solution. I had also thought about creating a new sub-class of
QueryFilter, NegativeQueryFilter. Basically, it would works just like a
QueryFilter, taking a positive query (so, I would pass it an OR'ed list of
profane words), then the resulting bits are simply flipped. I think this would
work, unless I'm missing something. I'm going to experiment with it, I'd
appreciate anyone's thoughts on this.
Thanks,
Greg
____________________________________________________________________________________
It's here! Your new message!
Get new email alerts with the free Yahoo! Toolbar.
http://tools.search.yahoo.com/toolbar/features/mail/
___________________________________________________________
New Yahoo! Mail is the ultimate force in competitive emailing. Find out more at
the Yahoo! Mail Championships. Plus: play games and win prizes.
http://uk.rd.yahoo.com/evt=44106/*http://mail.yahoo.net/uk
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]