Thanks Ian What I am currently doing is duplicating the data into 2 different fields and having my own PerFieldAnalyzerWrapper just like you pointed out
Is there a good way to do this in a single-pass? Like how Bi-Grams or Common-Grams do… -- Ravi On Tue, Feb 17, 2015 at 3:08 PM, Ian Lea <[email protected]> wrote: > Sounds like a job for > org.apache.lucene.analysis.miscellaneous.PerFieldAnalyzerWrapper. > > > -- > Ian. > > > On Tue, Feb 17, 2015 at 8:51 AM, Ravikumar Govindarajan > <[email protected]> wrote: > > We have a requirement in that E-mail addresses need to be added in a > > tokenized form to one field while untokenized form is added to another > field > > > > Ex: > > > > "I have mailed [email protected]" . It should tokenize as below > > > > body = {"I", "have", "mailed", "abc", "xyz", "com"}; > > > > I also have a body-addr field. Tokenizer needs to extract e-mail > addresses > > from body field and add them as below > > > > body-addr = {"[email protected]"} > > > > How to achieve this via tokenizer chain? > > > > -- > > Ravi > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > >
