I thought it was you, but wasn't sure.

I would also like a way to store the frequency of the term in the overall collection (probably should go in the Term dictionary, but not sure, at the cost of an additional VInt per term, but I am open to other places to store it). Right now, in order to calculate this, one has to either store it separately at indexing time (using a term counting Filter) or calculate it at runtime by looping over the TermDocs and summing.
Marvin Humphrey wrote:

On Jun 1, 2006, at 5:48 AM, Grant Ingersoll wrote:

Someone on the list a while ago suggested moving Term Vectors out of the postings and storing them separately, as then they don't have to be merged (but they doc ids would have to be kept up to date)

Yes, that was me. :) I suggested storing TermVector data alongside stored field data, in the .fdt file. That's what KinoSearch does right now. It cuts down on disk seeks.

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



--

Grant Ingersoll Sr. Software Engineer Center for Natural Language Processing Syracuse University School of Information Studies 335 Hinds Hall Syracuse, NY 13244 http://www.cnlp.org Voice: 315-443-5484 Fax: 315-443-6886

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to