[jira] Created: (SOLR-1910) Add hl.df (highlight-specific default field) param, so highlighting can have a separate analysis path

Chris Harris (JIRA) Wed, 12 May 2010 15:14:05 -0700

Add hl.df (highlight-specific default field) param, so highlighting can have a 
separate analysis path
-----------------------------------------------------------------------------------------------------


                 Key: SOLR-1910
                 URL: https://issues.apache.org/jira/browse/SOLR-1910
             Project: Solr
          Issue Type: Improvement
          Components: highlighter
    Affects Versions: 1.4
            Reporter: Chris Harris
         Attachments: SOLR-1910.patch

Summary: Patch adds a hl.df parameter, to help with (some) situations where the 
highlighter currently uses the "wrong" analyzer for highlighting.

What: hl.df is like the normal df parameter, except that it takes effect only 
during highlighting. (In fact the implementation is basically to temporarily 
mess with the normal df parameter at the start of highlighting, and then  
revert to the original value when highlighting is complete.) When hl.df is 
specified, we make sure not to use the Query object that was parsed by 
QueryComponent, but rather make our own. In the right circumstances anyway, 
this means that a more appropriate analyzer gets used for highlighting.

Motivation: Currently, in a normal query+highlighting request, the highlighter 
re-uses the Query object parsed by the QueryComponent. This can result in 
incorrect highlights if the field being highlighted is of a different type than 
the field being queried. In my particular case:
 * My queries don't explicitly specify field names; they always rely on the 
default field
 * My default field for search is "body"
 * body is a unigram-plus-bigram field. So, e.g. input "audit trail" gets 
turned into tokens "audit / audit trail / trail". (This is a performance 
optimzation.)
 * If I try to highlight directly on "body", the highlights get screwed up. 
(This is because the highlighter doesn't really support the kind of 
"continuously overlapping" tokens generated by my analysis chain. In short, the 
bigrams confuse the TokenGroup class.)
 * To avoid these highlighting problems, I don't directly highlight "body", but 
rather a "highlight" field, which has no bigram tokens. ("highlight" is 
populated from "body" with a copyfield directive.)
 * Without hl.df, I have a new class of highlighting problems. In particular, 
if the user enters a phrase search (e.g. "audit trail"), then that phrase 
appears unhighlighted in the highlighter output. The short version for why is 
that the analyzer used to parse the query output a Query object that contains 
bigrams, but the text that we're highlighting doesn't contain bigrams.
 * With hl.df, the analyzers match up for highlight; the Query object used for 
highlighting does _not_ contain bigrams, just like the "highlight" field.

(I realize it may help to expand the description of this use case, but I'm a 
bit hurried right now.)

I wanted to throw this out there, partly in case people have any better 
solutions. One variation on hl.df option that might be worth considering is 
hl.UseHighlightedFieldAsDefaultField, which would create a new Query object not 
just once at the start of highlighting, but separately for each particular 
field that's getting highlighted.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Created: (SOLR-1910) Add hl.df (highlight-specific default field) param, so highlighting can have a separate analysis path

Reply via email to