: I'm looking to store some additional information in a Lucene index : and I'm looking for an advise on how to implement the functionality. : Specifically, I'm planning to store 1) collection frequency count for : each term, 2) actual document length for each document (yes, I looked : at the norm factor, I'm still considering how to adapt it...) 3) : collection size (total number of terms) for each field 4) vocabulary : size (number of unique terms) for each field. All this info can be : computed on the fly, but I would prefer to generate it at the : indexing time and store somewhere.
Unless I'm missunderstanding your terminology, It seems like all of this information is either already stored in the index, or easy to add using the existing API #1 - Searchable.docFreq(Term):int #2 - add as a new field per document. #3 & #4 ... ...these are a little trickier. You can easily get both by iterating over IndexReader.terms(), but if you specifically want to store the data in the index, I would first add all of your documents, then use the TermEnum to compute the information and put it all as stored fields in a single "metadata" document with no indexed fields (or at least: none in common with your regular data). now you've precomputed everything you want to know, and it's easily available at query time. -Hoss --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]