Space will vary based on the content (number of unique terms), obviously, but I did submit some rough numbers that I saw for my implementation. Here they are (from my original patch submission):
I also tested by indexing 12,598 documents (88,362 terms) using both term vectors and no term vectors. Index size w/o term vectors: 42 MB Index size w/ term vectors: 71.3 MB Time for the first test was 5 minutes 30 seconds, time for the second test was 6 minutes 2 seconds. The term vector you get back is a list of strings, containing the term and the term frequency for the given document. I also submitted a Term Vector representation for the Query (see QueryTermVector), so I suppose you could loop over the two vectors and compare. Don't know if that solves your problem, but I hope it helps. -Grant >>> [EMAIL PROTECTED] 06/21/04 06:28AM >>> Hi, I have managed to extract the relevant information to highlight the search results out of an index that does not store field's content. The result is a list of matching terms, with their relative weights. This solves my problem, but it is very expensive, like I was expecting, as it uses the explain feature of the IndexSearcher. Since Lucene 1.4 I have seen that a new option is available for fields: storeTermVector. Now the questions: - how much space do storeTermVector uses on the index (compared to just indexed and fully stored fields)? - if I "storeTermVector" the fields can I get back the list of matching terms for a query in a more efficient way compared to a full explain computation? I am willing to drop weights altogether, if this could allow a more efficient computation. Thanks for your attention. Regards, Giulio Cesare Solaroli --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
