Jamie, UAX29URLEmailTokenizer does not emit email components as tokens; “john....@mycompany.com.au” will be tokenized as “john....@mycompany.com.au”, nothing more. That’s why I asked what EmailFilter does.
If the filter really is ignored by Lucene, that would be a bug in Lucene. I think something else is likely going on, though, which is why I asked you for an example query matching too many docs and a doc it improperly matches. Steve On Mar 28, 2014, at 10:54 AM, Jamie <ja...@mailarchiva.com> wrote: > Steve > > Thank for the contact. I believe UAX29URLEmailTokenizer tokenizes email > addresses as follows: john....@mycompany.com.au john.doe mycompany.com.au > john doe mycompany com au com.au.We have an overridden query parser that > swaps out anyaddress: with to, from, cc, bcc, etc. Inside the overridden > query parser, we call getFieldQuery() to build the clauses... > > Query q = super.getFieldQuery(field, emailAddress, true); > if (slop!=-1) { > applySlop(q,slop); > } > clauses.add(new BooleanClause(q, BooleanClause.Occur.SHOULD)); > > The query is outputted below. Sometimes when it is executed by Lucene, the > filter is ignored. > > I am busy trying to isolate the issue, since the code is running in a wider > system among other complexities. > > Jamie > > On 2014/03/28, 4:08 PM, Steve Rowe wrote: >> Hi Jamie, >> >> What does EmailFilter do? >> >> Why is the expanded form "required for the UAX29URLEmailTokenizer"? Seems >> like an exact match would work on the email address alone, without the >> expanded components? >> >> Do you have an example of a query that reproducibly matches more documents >> than it should, and a document that matched but shouldn’t have? >> >> Steve > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org