I would just throw your doc into a MemoryIndex (lives in contrib/
memory, I think; it only holds one doc), get the Vector and do what
you need to do. So you would kind of be doing indexing, but not really.
On Aug 13, 2009, at 8:43 AM, joe_coder wrote:
Grant, thanks for responding.
My issue is that I am not planning to use lucene ( as I don't need any
search capability, atleast yet). All I have is a text document and I
need to
extract keywords and their frequency ( which could be a simple split
on
space and tracking the count). But I realize that I would need to do
some
preprocessing to remove stopwords, stem words and also check for
synonyms.
So wondering if there is already such code present in lucene ( or
any other
project ) that I can use directly.
Thanks!
Grant Ingersoll-6 wrote:
On Aug 13, 2009, at 7:40 AM, joe_coder wrote:
I was wondering if there is any way to directly use Lucene API to
extract
terms from a given string. My requirement is that I have a text
document for
which I need a term frequency vector ( after stemming, removing
stopwords
and synonyms checks ). The result needs to be the terms and
frequency.
IndexReader.getTermFreqVector(), assuming you have indexed using Term
Vectors.
Is it possible to get this using any lucene API? ( As I see lucene
also
needs to stem, remove stopwords, synonyms etc before indexing). Or
is this
any java project that would help me in this?
--
View this message in context:
http://www.nabble.com/Term-Extraction-tp24953406p24953406.html
Sent from the Lucene - Java Users mailing list archive at
Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
--------------------------
Grant Ingersoll
http://www.lucidimagination.com/
Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
using Solr/Lucene:
http://www.lucidimagination.com/search
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org