Re: TermFreqVector based highlighter?

Grant Ingersoll Mon, 21 Jun 2004 11:12:21 -0700

Space will vary based on the content (number of unique terms), obviously, but I did 
submit some rough numbers that I saw for my implementation.  Here they are (from my 
original patch submission):


I also tested by indexing 12,598 documents (88,362 terms) using both term vectors and 
no term vectors.
Index size w/o term vectors: 42 MB
Index size w/ term vectors: 71.3 MB

Time for the first test was 5 minutes 30 seconds, time for the second test was 6 
minutes 2 seconds.


The term vector you get back is a list of strings, containing the term and the term 
frequency for the given document.  I also submitted a Term Vector representation for 
the Query (see QueryTermVector), so I suppose you could loop over the two vectors and 
compare.

Don't know if that solves your problem, but I hope it helps.

-Grant

>>> [EMAIL PROTECTED] 06/21/04 06:28AM >>>
Hi,

I have managed to extract the relevant information to highlight the
search results out of an index that does not store field's content.

The result is a list of matching terms, with their relative weights.

This solves my problem, but it is very expensive, like I was
expecting, as it uses the explain feature of the IndexSearcher.

Since Lucene 1.4 I have seen that a new option is available for
fields: storeTermVector.

Now the questions:
- how much space do storeTermVector uses on the index (compared to
just indexed and fully stored fields)?
- if I "storeTermVector" the fields can I get back the list of
matching terms for a query in a more efficient way compared to a full
explain computation?

I am willing to drop weights altogether, if this could allow a more
efficient computation.

Thanks for your attention.

Regards,

Giulio Cesare Solaroli

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED] 
For additional commands, e-mail: [EMAIL PROTECTED] 



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: TermFreqVector based highlighter?

Reply via email to