TermFreqVector and performance, index size

Philippe Deslauriers (Beetext) Thu, 27 Apr 2006 05:32:34 -0700

Hello,

We are upgrading from 1.3 to 1.9.
We planned to use the Highlight package for highlighting, replacing our in
house highlight classes.


>From what I can read, HighLight package requires the use of the
TermFreqVector to be added to the index. I will get into the Highlight
package later, but right now I am trying to understand the TermFreqVector
uses and impacts.

When adding the content field, I did a few tests with the different
options, to calculate the indexation time and size of the index, as we are
working with HUGE indexes (1Gb ++). For the test I used roughly 4500 random
text documents.

With lucene 1.3, time to index 2:05, index size 13.0 mb

With lucene 1.9

Field.TermVector NO  (time 1m:45s, index size 7.1 mb)
Field.TermVector WITH_POSITIONS_OFFSETS  -> (index size 25 mb !!!, time
2m:45s)
Field.TermVector YES NO  (time 2m:01s, index size 13.3 mb mb)

What are the OFFSETS and POSITIONS used for? Do I need it for Highlighting?
Can I create the TermFreqVector on the fly for a document, or do I have to
include them in the index?


Philippe 



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

TermFreqVector and performance, index size

Reply via email to