Sounds like the sort of filter that could be usefully cached.
You can do all this in Java code or the XML query parser (in contrib) might be 
a quick and simple way to externalize the profanity settings in a stylesheet 
which is actually used at query time e.g.

<?xml version="1.0" encoding="ISO-8859-1"?>

<xsl:template match="/Document">
<FilteredQuery>
    <Query>
        <<UserQuery><xsl:value-of select="content"/></UserQuery>
    </Query>
    <Filter>
        <CachedFilter>
            <TermsFilter fieldName="content">            
                naughty1 naughty2 xxx
            </TermsFilter>
        </CachedFilter>        
    </Filter>    
</FilteredQuery>
</xsl:template>
</xsl:stylesheet>

The above example also automatically adds caching to the results of the 
profanity filter.
Your app code to use this would then look like this:
init()
        //parse and cache the stylesheet
        QueryTemplateManager qtm=new 
QueryTemplateManager(getClass().getResourceAsStream("query.xsl"));
....


runQuery()
            //get the user input
            Properties userInput=new Properties();
             
userInput.setProperty("content",httpRequest.getParameter("queryCriteria");

            //Transform the user input into a Lucene XML query
            org.w3c.dom.Document doc=qtm.getQueryAsDOM(userInput);
            
            //Parse the XML query using the XML parser
            Query q=xmlQueryBuilder.getQuery(doc.getDocumentElement());
  
            //run query as normal

Cheers
Mark


----- Original Message ----
From: Greg Gershman <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Wednesday, 7 March, 2007 3:07:45 PM
Subject: Negative Filtering (such as for profanity)

I'm attempting to create a profanity filter.  I thought to use a QueryFilter 
created with a Query of (-$#!+ AND [EMAIL PROTECTED] AND etc).  The problem I 
have run into is that, as a pure negative query is not supported (a query for 
(-term) DOES NOT return the inverse of a query for (term)), I believe the bit 
set returned by a purely negative QueryFilter is empty, so no matter how many 
results returned by the initial query, the result after filtering is always 
zero documents.

I was wondering if anyone had suggestions as to how else to do this.  I've 
considered simply amending the query string submitted by the user to include a 
pre-generated String that would exclude the query terms, but I consider this a 
non-elegant solution.  I had also thought about creating a new sub-class of 
QueryFilter, NegativeQueryFilter.  Basically, it would works just like a 
QueryFilter, taking a positive query (so, I would pass it an OR'ed list of 
profane words), then the resulting bits are simply flipped.  I think this would 
work, unless I'm missing something.  I'm going to experiment with it, I'd 
appreciate anyone's thoughts on this.

Thanks,

Greg




 
____________________________________________________________________________________
It's here! Your new message!  
Get new email alerts with the free Yahoo! Toolbar.
http://tools.search.yahoo.com/toolbar/features/mail/




        
        
                
___________________________________________________________ 
New Yahoo! Mail is the ultimate force in competitive emailing. Find out more at 
the Yahoo! Mail Championships. Plus: play games and win prizes. 
http://uk.rd.yahoo.com/evt=44106/*http://mail.yahoo.net/uk

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to