Grant, thanks for responding. My issue is that I am not planning to use lucene ( as I don't need any search capability, atleast yet). All I have is a text document and I need to extract keywords and their frequency ( which could be a simple split on space and tracking the count). But I realize that I would need to do some preprocessing to remove stopwords, stem words and also check for synonyms. So wondering if there is already such code present in lucene ( or any other project ) that I can use directly.
Thanks! Grant Ingersoll-6 wrote: > > > On Aug 13, 2009, at 7:40 AM, joe_coder wrote: > >> >> I was wondering if there is any way to directly use Lucene API to >> extract >> terms from a given string. My requirement is that I have a text >> document for >> which I need a term frequency vector ( after stemming, removing >> stopwords >> and synonyms checks ). The result needs to be the terms and frequency. > > IndexReader.getTermFreqVector(), assuming you have indexed using Term > Vectors. > > >> >> Is it possible to get this using any lucene API? ( As I see lucene >> also >> needs to stem, remove stopwords, synonyms etc before indexing). Or >> is this >> any java project that would help me in this? >> -- >> View this message in context: >> http://www.nabble.com/Term-Extraction-tp24953406p24953406.html >> Sent from the Lucene - Java Users mailing list archive at Nabble.com. >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> > > -------------------------- > Grant Ingersoll > http://www.lucidimagination.com/ > > Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) > using Solr/Lucene: > http://www.lucidimagination.com/search > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > -- View this message in context: http://www.nabble.com/Term-Extraction-tp24953406p24954024.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org