On Jun 21, 2006, at 3:32 AM, David Balmain wrote:

> I'll
> be implementing the highlighter in C rather than in Ruby so I'll be
> interested to see how you go with it.
>
> The main difference in the API is that you won't specify the store,
> index and term_vector parameters per document field any more. This
> option will still be available but the behaviour will be slightly
> different. I'll go into more detail later.

How close is what you're going to be doing to the Lucene contrib  
highlighter?

FWIW, the KinoSearch Highlighter uses similar techniques for adding  
tags and encoding, but the excerpt selection is pretty different.  No  
TokenStream required, it uses a heat map.  Right now it requires that  
the field have term vectors stored with positions and offsets, but it  
could be adapted to generate the vectors by re-analyzing.

The principle advantage it has over the Lucene Highlighter in that it  
handles phrases properly:

    http://xrl.us/nm2z (Link to www.lucenebook.com)
    http://xrl.us/nm25 (Link to www.rectangular.com)

Whatever algorithm we choose for Lucy, I hope it will meet that  
constraint.

Higlighter.pm isn't that long (384 lines including docs) and if I  
didn't have an serious deadlines bearing down doing a Ruby version  
would be a great exercise for me.  If you or Marcus want to check it  
out, the new version's only in subversion:

   http://xrl.us/nm28 (Link to www.rectangular.com)

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/

_______________________________________________
Ferret-talk mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/ferret-talk

Reply via email to