Hi All,

i want to index email fields as both analyzed and not analyzed using custom
analyzer.

for example,
sm...@yahoo.com
will.sm...@yahoo.com

that is,  indexing sm...@yahoo.com as single token as well as analyzed
tokens in same email field...


My existing custom analyzer,

public class CustomSearchAnalyzer extends StopwordAnalyzerBase
{

    public CustomSearchAnalyzer(Version matchVersion, Reader stopwords)
throws Exception
    {
        super(matchVersion, loadStopwordSet(stopwords, matchVersion));
    }

    @Override
    protected Analyzer.TokenStreamComponents createComponents(final String
fieldName, final Reader reader)
    {
        final ClassicTokenizer src = new ClassicTokenizer(getVersion(),
reader);
        src.setMaxTokenLength(ClassicAnalyzer.DEFAULT_MAX_TOKEN_LENGTH);
        TokenStream tok = new ClassicFilter(src);
        tok = new LowerCaseFilter(getVersion(), tok);
        tok = new StopFilter(getVersion(), tok, stopwords);
        tok = new ASCIIFoldingFilter(tok); // to enable AccentInsensitive
search

        return new Analyzer.TokenStreamComponents(src, tok)
        {
            @Override
            protected void setReader(final Reader reader) throws IOException
            {

src.setMaxTokenLength(ClassicAnalyzer.DEFAULT_MAX_TOKEN_LENGTH);
                super.setReader(reader);
            }
        };
    }
}


And so i want to achieve like,

1.if i search using query "sm...@yahoo.com", records with
will.sm...@yahoo.com should not come...
2.Also i should be able to search using query "smith" in that field
3.if possible, should be able to detect email values in all other fields
and apply the same type of tokenization

How to achieve point 1 and 2 using UAX29URLEmailTokenizer? how to add
UAX29URLEmailTokenizer in my existing custom analyzer without using email
analyzer ( perfieldanalyzer )  for email field.. And so i can apply this
tokenizer for email terms of all fields..



-
Kumaran R

Reply via email to