Steve

Thank for the contact. I believe UAX29URLEmailTokenizer tokenizes email addresses as follows: john....@mycompany.com.au john.doe mycompany.com.au john doe mycompany com au com.au.We have an overridden query parser that swaps out anyaddress: with to, from, cc, bcc, etc. Inside the overridden query parser, we call getFieldQuery() to build the clauses...

Query q = super.getFieldQuery(field, emailAddress, true);
if (slop!=-1) {
applySlop(q,slop);
}
clauses.add(new BooleanClause(q, BooleanClause.Occur.SHOULD));

The query is outputted below. Sometimes when it is executed by Lucene, the filter is ignored.

I am busy trying to isolate the issue, since the code is running in a wider system among other complexities.

Jamie

On 2014/03/28, 4:08 PM, Steve Rowe wrote:
Hi Jamie,

What does EmailFilter do?

Why is the expanded form "required for the UAX29URLEmailTokenizer"?  Seems like 
an exact match would work on the email address alone, without the expanded components?

Do you have an example of a query that reproducibly matches more documents than 
it should, and a document that matched but shouldn’t have?

Steve   


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to