David Balmain wrote on 10/10/2006 08:53 PM:
> On 10/11/06, Chuck Williams <[EMAIL PROTECTED]> wrote:
>
> I personally would always store term vectors since I use a
> StandardTokenizer and Stemming. In this case highlighting matches in
> small documents is not trivial. Ferret's highlighter matches even
> sloppy phrase queries and phrases with gaps between the terms
> correctly. I couldn't do this without the use of term vectors.

I use stemming as well, but am not yet matching phrases like that. 
Perhaps term vectors will be useful to achieve this, although they come
at a high cost and it doesn't seem difficult or expensive to do the
matching directly on the text of small items.

>> I suppose it would be possible for the single conceptual field 'body' to
>> be represented with two physical fields 'smallBody' and 'largeBody'
>> where the former stores term vectors and the latter does not.
>
> If I really wanted to solve this problem I would use this solution. It
> is pretty easy to search multiple fields when I need to. Ferret's
> Query language even supports it:
>
>    smallBody|largeBody:"phrase to search for"

Couldn't agree more.  I have a number of extensions to Lucene's query
parser, including this for multiple fields:

{smallBody largeBody}:"phrase to search for"

>
> In the end, I think the benifits of my model far outweight the costs.
> For me at least anyway.

Based on the performance figures so far, it seems they do!  I think
dynamic term vectors have a substantial benefit, but can easily be
implemented in model where all field indexing properties are fixed.

Chuck


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to