I would just throw your doc into a MemoryIndex (lives in contrib/ memory, I think; it only holds one doc), get the Vector and do what you need to do. So you would kind of be doing indexing, but not really.


On Aug 13, 2009, at 8:43 AM, joe_coder wrote:


Grant, thanks for responding.

My issue is that I am not planning to use lucene ( as I don't need any
search capability, atleast yet). All I have is a text document and I need to extract keywords and their frequency ( which could be a simple split on space and tracking the count). But I realize that I would need to do some preprocessing to remove stopwords, stem words and also check for synonyms. So wondering if there is already such code present in lucene ( or any other
project ) that I can use directly.

Thanks!



Grant Ingersoll-6 wrote:


On Aug 13, 2009, at 7:40 AM, joe_coder wrote:


I was wondering if there is any way to directly use Lucene API to
extract
terms from a given string. My requirement is that I have a text
document for
which I need a term frequency vector ( after stemming, removing
stopwords
and synonyms checks ). The result needs to be the terms and frequency.

IndexReader.getTermFreqVector(), assuming you have indexed using Term
Vectors.



Is it possible to get this using any lucene API? ( As I see lucene
also
needs to stem, remove stopwords, synonyms etc before indexing). Or
is this
any java project that would help me in this?
--
View this message in context:
http://www.nabble.com/Term-Extraction-tp24953406p24953406.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org


--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
using Solr/Lucene:
http://www.lucidimagination.com/search


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to