I've been working on a new highlighter on and off for a few weeks and I'd
love for other folks to try it out:
https://github.com/wikimedia/search-highlighter

You should try it because:
1.  Its pretty quick.
2.  It supports many of the features of the other highlighters and lets you
combine them in new ways.
3.  Has a few tricks that none other highlighters have.
4.  It doesn't require that you store any extra data information but will
use what it can to speed itself up.

I've installed it on our beta
site<http://simple.wikipedia.beta.wmflabs.org/w/index.php?title=Special%3ASearch&profile=default&search=chess+players&fulltext=Search>so
you can run see it in action without installing it.

Let me expand on my list above:
It doesn't require any extra data and is nice and fast that way for short
fields.  Once fields get longer [0] reanalyzing them starts to take too
long so it is best to store offsets in the postings just like the postings
highlighter.  It can use term vectors the same way that the fast vector
highlighter can but that is slower than postings and takes up more space.

It supports three fragmenters: one that mimics the postings highlighter,
one that mimics the fast vector highlighter, and one that always highlights
the whole value.

It supports matched_fields, no_match_size, and most everything else in the
highlight api.  It doesn't support require_field_match though.

It adds a handful of tricks like returning the top scoring snippets in
document order and weighing terms that appear early in the document
higher.  Nothing difficult, but still cute tricks.  Its reasonably easy to
implement new tricks so if you have any ideas I'd love to hear them.

I don't think it is really ready for production usage yet but I'd like to
get there in a week or two.

Thanks for reading,

Nik

[0]: I haven't done the measurements to figure out how long the field has
to be before it is faster to use postings then reanalyze it.  I did the
math a few months ago for how long the field has to be before vectors
become faster.  It was a couple of KB for my analysis chain but I'm not
sure any of that holds true for this highlighter.  It could be more or less.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAPmjWd2ZpSdfcko5DtT6YNh1yjKG-NOek41ot%2BcPY1D84uDkHg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to