Hi, I am using Lucene right now to index several semi-structured documents. I recently had to implement a method 'getFrequencyVector()' to simply return a mapping of keyword -> frequency from the information already in the lucene index.
I currently maintain the lucene index on basis of the keyword -> (document, freq)* mapping. The best solution I could come up with is to iterate over all the keywords ( :( ) match my own document identifier and build the vector. Any ideas/suggestions? Is there a way to speed up the vector computation? It currently takes a |k|*|d| where |k| is the total number of keywords indexed and |d| is the average number of documents a keyword can occur in. Ideally, I would like to have a forward index, document to the pair (keyword, frequency) for this application. Thank you in advance for you expertise and your time. Cheers, Santosh Dawara Graduate Student Rochester Instt of Tech --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
