David Balmain wrote on 10/10/2006 03:56 PM:
> Actually not using single doc segments was only possible due to the
> fact that I have constant field numbers so both optimizations stem
> from this one change. So it I'm not sure if it is worth answering your
> question but I'll try anyway. It obviously depends if you are storing
> the fields and term-vectors. Most Ferret using are indexing data from
> a database and are only storing an id field and no term-vectors so the
> biggest optimization for them is the merge algorithm I'm using for
> term-infos. On the other hand if you want to highlight the fields,
> (Ferret has a very accurate highlighting algorithm that actually uses
> the queries to get the exact terms and phrases matched) then you need
> to store the field with term-vectors. In this case the merging of
> fields and term-vectors is going to be a lot more important.

Hi David,

I use a rich global field model and use term vectors for fast accurate
excerpting in Lucene.  Whether or not to store term vectors is the one
index property that is not fixed in my model.  The reason is that my
collections tend to contain a mix of many small email messages and a
comparatively small number of much larger documents.  Term vectors are a
significant advantage for excerpting large documents, but add no value
and unnecessarily bloat the index for all the small emails.  I use a
size threshold to only store term vectors when the body content of the
field exceeds that threshold.

Would your model in Ferret support that particular field variation?  Do
you have an alternative representation to achieve similar benefits?  I
suppose it would be possible for the single conceptual field 'body' to
be represented with two physical fields 'smallBody' and 'largeBody'
where the former stores term vectors and the latter does not.

Chuck


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to