Hi all,

Sorry if I'm asking an age old question but we have migrated to lucene 3.6.0 
and I see StandardAnalyzer has changed its behaviour, particularly when 
tokenizing email addresses. From reading the forums, I understand 
StandardAnalyzer was renamed to ClassicAnalyzer - is this the case ?


If I pass the string 'u...@domain.com' through these analyzers, I get the 
following tokens:

Using StandardAnalyzer(Version.LUCENE_23):  -->  u...@domain.com (one token)

Using StandardAnalyzer(Version.LUCENE_36):  -->  user domain.com    (two tokens)
Using ClassicAnalyzer(Version.LUCENE_36):     -->  u...@domain.com  (one token)

StandardAnalyzer is normally a good compromise as a default analyzer but the 
failure to keep an email address intact makes it less fit for purpose than it 
used to be. Is this a bug or is it by design ?  If by design, what is the 
reason for the change and is ClassicAnalyzer now the defacto-analyzer to use ?

Thanks,
Clive

Reply via email to