Grant, thanks for responding.

My issue is that I am not planning to use lucene ( as I don't need any
search capability, atleast yet). All I have is a text document and I need to
extract keywords and their frequency ( which could be a simple split on
space and tracking the count). But I realize that I would need to do some
preprocessing to remove stopwords, stem words and also check for synonyms.
So wondering if there is already such code present in lucene ( or any other
project ) that I can use directly.

Thanks!



Grant Ingersoll-6 wrote:
> 
> 
> On Aug 13, 2009, at 7:40 AM, joe_coder wrote:
> 
>>
>> I was wondering if there is any way to directly use Lucene API to  
>> extract
>> terms from a given string. My requirement is that I have a text  
>> document for
>> which I need a term frequency vector ( after stemming, removing  
>> stopwords
>> and synonyms checks ). The result needs to be the terms and frequency.
> 
> IndexReader.getTermFreqVector(), assuming you have indexed using Term  
> Vectors.
> 
> 
>>
>> Is it possible to get this using any lucene API? ( As I see lucene  
>> also
>> needs to stem, remove stopwords, synonyms etc before indexing). Or  
>> is this
>> any java project that would help me in this?
>> -- 
>> View this message in context:
>> http://www.nabble.com/Term-Extraction-tp24953406p24953406.html
>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>
> 
> --------------------------
> Grant Ingersoll
> http://www.lucidimagination.com/
> 
> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
> using Solr/Lucene:
> http://www.lucidimagination.com/search
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Term-Extraction-tp24953406p24954024.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to