Hello, We are upgrading from 1.3 to 1.9. We planned to use the Highlight package for highlighting, replacing our in house highlight classes.
>From what I can read, HighLight package requires the use of the TermFreqVector to be added to the index. I will get into the Highlight package later, but right now I am trying to understand the TermFreqVector uses and impacts. When adding the content field, I did a few tests with the different options, to calculate the indexation time and size of the index, as we are working with HUGE indexes (1Gb ++). For the test I used roughly 4500 random text documents. With lucene 1.3, time to index 2:05, index size 13.0 mb With lucene 1.9 Field.TermVector NO (time 1m:45s, index size 7.1 mb) Field.TermVector WITH_POSITIONS_OFFSETS -> (index size 25 mb !!!, time 2m:45s) Field.TermVector YES NO (time 2m:01s, index size 13.3 mb mb) What are the OFFSETS and POSITIONS used for? Do I need it for Highlighting? Can I create the TermFreqVector on the fly for a document, or do I have to include them in the index? Philippe --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]