On 6/21/06, Marvin Humphrey <[EMAIL PROTECTED]> wrote:
>
> On Jun 21, 2006, at 3:32 AM, David Balmain wrote:
>
> > I'll
> > be implementing the highlighter in C rather than in Ruby so I'll be
> > interested to see how you go with it.
> >
> > The main difference in the API is that you won't specify the store,
> > index and term_vector parameters per document field any more. This
> > option will still be available but the behaviour will be slightly
> > different. I'll go into more detail later.
>
> How close is what you're going to be doing to the Lucene contrib
> highlighter?

Well I haven't actually started it yet so we'll see.

> FWIW, the KinoSearch Highlighter uses similar techniques for adding
> tags and encoding, but the excerpt selection is pretty different.  No
> TokenStream required, it uses a heat map.  Right now it requires that
> the field have term vectors stored with positions and offsets, but it
> could be adapted to generate the vectors by re-analyzing.
>
> The principle advantage it has over the Lucene Highlighter in that it
> handles phrases properly:
>
>     http://xrl.us/nm2z (Link to www.lucenebook.com)
>     http://xrl.us/nm25 (Link to www.rectangular.com)
>
> Whatever algorithm we choose for Lucy, I hope it will meet that
> constraint.
>
> Higlighter.pm isn't that long (384 lines including docs) and if I
> didn't have an serious deadlines bearing down doing a Ruby version
> would be a great exercise for me.  If you or Marcus want to check it
> out, the new version's only in subversion:
>
>    http://xrl.us/nm28 (Link to www.rectangular.com)

Cool, I'll definitely check this out. Thanks Marvin.
_______________________________________________
Ferret-talk mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/ferret-talk

Reply via email to