[
https://issues.apache.org/jira/browse/SOLR-1910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Erick Erickson resolved SOLR-1910.
----------------------------------
Resolution: Won't Fix
2013 Old JIRA cleanup
> Add hl.df (highlight-specific default field) param, so highlighting can have
> a separate analysis path from search
> -----------------------------------------------------------------------------------------------------------------
>
> Key: SOLR-1910
> URL: https://issues.apache.org/jira/browse/SOLR-1910
> Project: Solr
> Issue Type: Improvement
> Components: highlighter
> Affects Versions: 1.4
> Reporter: Chris Harris
> Attachments: SOLR-1910.patch
>
>
> Summary: Patch adds a hl.df parameter, to help with (some) situations where
> the highlighter currently uses the "wrong" analyzer for highlighting.
> What: hl.df is like the normal df parameter, except that it takes effect only
> during highlighting. (In fact the implementation is basically to temporarily
> mess with the normal df parameter at the start of highlighting, and then
> revert to the original value when highlighting is complete.) When hl.df is
> specified, we make sure not to use the Query object that was parsed by
> QueryComponent, but rather make our own. In the right circumstances anyway,
> this means that a more appropriate analyzer gets used for highlighting.
> Motivation: Currently, in a normal query+highlighting request, the
> highlighter re-uses the Query object parsed by the QueryComponent. This can
> result in incorrect highlights if the field being highlighted is of a
> different type than the field being queried. In my particular case:
> * My queries don't explicitly specify field names; they always rely on the
> default field
> * My default field for search is "body"
> * body is a unigram-plus-bigram field. So, e.g. input "audit trail" gets
> turned into tokens "audit / audit trail / trail". (This is a performance
> optimzation.)
> * If I try to highlight directly on "body", the highlights get screwed up.
> (This is because the highlighter doesn't really support the kind of
> "continuously overlapping" tokens generated by my analysis chain. In short,
> the bigrams confuse the TokenGroup class.)
> * To avoid these highlighting problems, I don't directly highlight "body",
> but rather a "highlight" field, which has no bigram tokens. ("highlight" is
> populated from "body" with a copyfield directive.)
> * Without hl.df, I have a new class of highlighting problems. In particular,
> if the user enters a phrase search (e.g. "audit trail"), then that phrase
> appears unhighlighted in the highlighter output. The short version for why is
> that the analyzer used to parse the query output a Query object that contains
> bigrams, but the text that we're highlighting doesn't contain bigrams.
> * With hl.df, the analyzers match up for highlight; the Query object used
> for highlighting does _not_ contain bigrams, just like the "highlight" field.
> (I realize it may help to expand the description of this use case, but I'm a
> bit hurried right now.)
> I wanted to throw this out there, partly in case people have any better
> solutions. One variation on hl.df option that might be worth considering is
> hl.UseHighlightedFieldAsDefaultField, which would create a new Query object
> not just once at the start of highlighting, but separately for each
> particular field that's getting highlighted.
--
This message was sent by Atlassian JIRA
(v6.1#6144)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]