Jamie,

UAX29URLEmailTokenizer does not emit email components as tokens; 
“john....@mycompany.com.au” will be tokenized as “john....@mycompany.com.au”, 
nothing more.  That’s why I asked what EmailFilter does.

If the filter really is ignored by Lucene, that would be a bug in Lucene.  I 
think something else is likely going on, though, which is why I asked you for 
an example query matching too many docs and a doc it improperly matches. 

Steve

On Mar 28, 2014, at 10:54 AM, Jamie <ja...@mailarchiva.com> wrote:

> Steve
> 
> Thank for the contact. I believe UAX29URLEmailTokenizer tokenizes email 
> addresses as follows: john....@mycompany.com.au john.doe mycompany.com.au 
> john doe mycompany com au com.au.We have an overridden query parser that 
> swaps out anyaddress: with to, from, cc, bcc, etc. Inside the overridden 
> query parser, we call getFieldQuery() to build the clauses...
> 
> Query q = super.getFieldQuery(field, emailAddress, true);
> if (slop!=-1) {
> applySlop(q,slop);
> }
> clauses.add(new BooleanClause(q, BooleanClause.Occur.SHOULD));
> 
> The query is outputted below. Sometimes when it is executed by Lucene, the 
> filter is ignored.
> 
> I am busy trying to isolate the issue, since the code is running in a wider 
> system among other complexities.
> 
> Jamie
> 
> On 2014/03/28, 4:08 PM, Steve Rowe wrote:
>> Hi Jamie,
>> 
>> What does EmailFilter do?
>> 
>> Why is the expanded form "required for the UAX29URLEmailTokenizer"?  Seems 
>> like an exact match would work on the email address alone, without the 
>> expanded components?
>> 
>> Do you have an example of a query that reproducibly matches more documents 
>> than it should, and a document that matched but shouldn’t have?
>> 
>> Steve        
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to