Ah. Sorry. Last post was a ProfanitySelector rather than ProfanityFilter! -
this fixes it anyway....
<CachedFilter>
<BooleanFilter>
<Clause occurs="mustNot">
<TermsFilter fieldName="content">
naughty1 naughty2 xxx
</TermsFilter>
</Clause>
</BooleanFilter>
</CachedFilter>
----- Original Message ----
From: mark harwood <[EMAIL PROTECTED]>
To: [email protected]
Sent: Wednesday, 7 March, 2007 4:05:56 PM
Subject: Re: Negative Filtering (such as for profanity)
Sounds like the sort of filter that could be usefully cached.
You can do all this in Java code or the XML query parser (in contrib) might be
a quick and simple way to externalize the profanity settings in a stylesheet
which is actually used at query time e.g.
<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:template match="/Document">
<FilteredQuery>
<Query>
<<UserQuery><xsl:value-of select="content"/></UserQuery>
</Query>
<Filter>
<CachedFilter>
<TermsFilter fieldName="content">
naughty1 naughty2 xxx
</TermsFilter>
</CachedFilter>
</Filter>
</FilteredQuery>
</xsl:template>
</xsl:stylesheet>
The above example also automatically adds caching to the results of the
profanity filter.
Your app code to use this would then look like this:
init()
//parse and cache the stylesheet
QueryTemplateManager qtm=new
QueryTemplateManager(getClass().getResourceAsStream("query.xsl"));
....
runQuery()
//get the user input
Properties userInput=new Properties();
userInput.setProperty("content",httpRequest.getParameter("queryCriteria");
//Transform the user input into a Lucene XML query
org.w3c.dom.Document doc=qtm.getQueryAsDOM(userInput);
//Parse the XML query using the XML parser
Query q=xmlQueryBuilder.getQuery(doc.getDocumentElement());
//run query as normal
Cheers
Mark
----- Original Message ----
From: Greg Gershman <[EMAIL PROTECTED]>
To: [email protected]
Sent: Wednesday, 7 March, 2007 3:07:45 PM
Subject: Negative Filtering (such as for profanity)
I'm attempting to create a profanity filter. I thought to use a QueryFilter
created with a Query of (-$#!+ AND [EMAIL PROTECTED] AND etc). The problem I
have run into is that, as a pure negative query is not supported (a query for
(-term) DOES NOT return the inverse of a query for (term)), I believe the bit
set returned by a purely negative QueryFilter is empty, so no matter how many
results returned by the initial query, the result after filtering is always
zero documents.
I was wondering if anyone had suggestions as to how else to do this. I've
considered simply amending the query string submitted by the user to include a
pre-generated String that would exclude the query terms, but I consider this a
non-elegant solution. I had also thought about creating a new sub-class of
QueryFilter, NegativeQueryFilter. Basically, it would works just like a
QueryFilter, taking a positive query (so, I would pass it an OR'ed list of
profane words), then the resulting bits are simply flipped. I think this would
work, unless I'm missing something. I'm going to experiment with it, I'd
appreciate anyone's thoughts on this.
Thanks,
Greg
____________________________________________________________________________________
It's here! Your new message!
Get new email alerts with the free Yahoo! Toolbar.
http://tools.search.yahoo.com/toolbar/features/mail/
___________________________________________________________
New Yahoo! Mail is the ultimate force in competitive emailing. Find out more at
the Yahoo! Mail Championships. Plus: play games and win prizes.
http://uk.rd.yahoo.com/evt=44106/*http://mail.yahoo.net/uk
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
___________________________________________________________
What kind of emailer are you? Find out today - get a free analysis of your
email personality. Take the quiz at the Yahoo! Mail Championship.
http://uk.rd.yahoo.com/evt=44106/*http://mail.yahoo.net/uk
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]