Hi Uwe, Thank you. I do not have the tokens serialized, so that reduces one step. I am reading the javadocs and will try it the way you mentioned.
Regards, Sachin On Sun, Sep 14, 2014 at 5:11 PM, Uwe Schindler <u...@thetaphi.de> wrote: > Hi, > > If you have the serialized tokens in a file, you can write a custom > TokenStream that unserializes them and feeds them to IndexWriter as a Field > instance in a Document instance. Please read the javadocs how to write your > own TokenStream implementation and pass it using "new TextField(name, > yourTokenStream)". > > Uwe > > ----- > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > > > -----Original Message----- > > From: Sachin Kulkarni [mailto:kulk...@hawk.iit.edu] > > Sent: Sunday, September 14, 2014 10:06 PM > > To: java-user@lucene.apache.org > > Subject: Can lucene index tokenized files? > > > > Hi, > > > > I have a dataset which has files in the form of tokens where the > original data > > has been tokenized, stemmed, stopworded. > > > > Is it possible to skip the lucene analyzers and index this dataset in > Lucene? > > > > So far the dataset I have dealt with was raw and used Lucene's > tokenization > > and stemming schemes. > > > > Thank you. > > > > Regards, > > Sachin > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >