David Balmain wrote on 10/10/2006 08:53 PM: > On 10/11/06, Chuck Williams <[EMAIL PROTECTED]> wrote: > > I personally would always store term vectors since I use a > StandardTokenizer and Stemming. In this case highlighting matches in > small documents is not trivial. Ferret's highlighter matches even > sloppy phrase queries and phrases with gaps between the terms > correctly. I couldn't do this without the use of term vectors.
I use stemming as well, but am not yet matching phrases like that. Perhaps term vectors will be useful to achieve this, although they come at a high cost and it doesn't seem difficult or expensive to do the matching directly on the text of small items. >> I suppose it would be possible for the single conceptual field 'body' to >> be represented with two physical fields 'smallBody' and 'largeBody' >> where the former stores term vectors and the latter does not. > > If I really wanted to solve this problem I would use this solution. It > is pretty easy to search multiple fields when I need to. Ferret's > Query language even supports it: > > smallBody|largeBody:"phrase to search for" Couldn't agree more. I have a number of extensions to Lucene's query parser, including this for multiple fields: {smallBody largeBody}:"phrase to search for" > > In the end, I think the benifits of my model far outweight the costs. > For me at least anyway. Based on the performance figures so far, it seems they do! I think dynamic term vectors have a substantial benefit, but can easily be implemented in model where all field indexing properties are fixed. Chuck --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]