On 6/21/06, Marvin Humphrey <[EMAIL PROTECTED]> wrote: > > On Jun 21, 2006, at 3:32 AM, David Balmain wrote: > > > I'll > > be implementing the highlighter in C rather than in Ruby so I'll be > > interested to see how you go with it. > > > > The main difference in the API is that you won't specify the store, > > index and term_vector parameters per document field any more. This > > option will still be available but the behaviour will be slightly > > different. I'll go into more detail later. > > How close is what you're going to be doing to the Lucene contrib > highlighter?
Well I haven't actually started it yet so we'll see. > FWIW, the KinoSearch Highlighter uses similar techniques for adding > tags and encoding, but the excerpt selection is pretty different. No > TokenStream required, it uses a heat map. Right now it requires that > the field have term vectors stored with positions and offsets, but it > could be adapted to generate the vectors by re-analyzing. > > The principle advantage it has over the Lucene Highlighter in that it > handles phrases properly: > > http://xrl.us/nm2z (Link to www.lucenebook.com) > http://xrl.us/nm25 (Link to www.rectangular.com) > > Whatever algorithm we choose for Lucy, I hope it will meet that > constraint. > > Higlighter.pm isn't that long (384 lines including docs) and if I > didn't have an serious deadlines bearing down doing a Ruby version > would be a great exercise for me. If you or Marcus want to check it > out, the new version's only in subversion: > > http://xrl.us/nm28 (Link to www.rectangular.com) Cool, I'll definitely check this out. Thanks Marvin. _______________________________________________ Ferret-talk mailing list [email protected] http://rubyforge.org/mailman/listinfo/ferret-talk

