I thought it was you, but wasn't sure.
I would also like a way to store the frequency of the term in the
overall collection (probably should go in the Term dictionary, but not
sure, at the cost of an additional VInt per term, but I am open to other
places to store it). Right now, in order to calculate this, one has to
either store it separately at indexing time (using a term counting
Filter) or calculate it at runtime by looping over the TermDocs and
summing.
Marvin Humphrey wrote:
On Jun 1, 2006, at 5:48 AM, Grant Ingersoll wrote:
Someone on the list a while ago suggested moving Term Vectors out of
the postings and storing them separately, as then they don't have to
be merged (but they doc ids would have to be kept up to date)
Yes, that was me. :) I suggested storing TermVector data alongside
stored field data, in the .fdt file. That's what KinoSearch does
right now. It cuts down on disk seeks.
Marvin Humphrey
Rectangular Research
http://www.rectangular.com/
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
--
Grant Ingersoll
Sr. Software Engineer
Center for Natural Language Processing
Syracuse University
School of Information Studies
335 Hinds Hall
Syracuse, NY 13244
http://www.cnlp.org
Voice: 315-443-5484
Fax: 315-443-6886
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]