Hi, 

I am using Lucene right now to index several semi-structured documents. I 
recently had to implement a method 'getFrequencyVector()' to simply return 
a mapping of keyword -> frequency from the information already in the 
lucene index. 

I currently maintain the lucene index on basis of the keyword -> (document, 
freq)* mapping. The best solution I could come up with is to iterate over 
all the keywords ( :( ) match my own document identifier and build the 
vector. Any ideas/suggestions? 

Is there a way to speed up the vector computation? It currently takes a 
|k|*|d| where |k| is the total number of keywords indexed and |d| is the 
average number of documents a keyword can occur in. 

Ideally, I would like to have a forward index, document to the pair 
(keyword, frequency) for this application. Thank you in advance for you 
expertise and your time. 

Cheers, 
Santosh Dawara 
Graduate Student 
Rochester Instt of Tech


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to