[ https://issues.apache.org/jira/browse/SOLR-1910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Chris Harris updated SOLR-1910: ------------------------------- Summary: Add hl.df (highlight-specific default field) param, so highlighting can have a separate analysis path from search (was: Add hl.df (highlight-specific default field) param, so highlighting can have a separate analysis path) > Add hl.df (highlight-specific default field) param, so highlighting can have > a separate analysis path from search > ----------------------------------------------------------------------------------------------------------------- > > Key: SOLR-1910 > URL: https://issues.apache.org/jira/browse/SOLR-1910 > Project: Solr > Issue Type: Improvement > Components: highlighter > Affects Versions: 1.4 > Reporter: Chris Harris > Attachments: SOLR-1910.patch > > > Summary: Patch adds a hl.df parameter, to help with (some) situations where > the highlighter currently uses the "wrong" analyzer for highlighting. > What: hl.df is like the normal df parameter, except that it takes effect only > during highlighting. (In fact the implementation is basically to temporarily > mess with the normal df parameter at the start of highlighting, and then > revert to the original value when highlighting is complete.) When hl.df is > specified, we make sure not to use the Query object that was parsed by > QueryComponent, but rather make our own. In the right circumstances anyway, > this means that a more appropriate analyzer gets used for highlighting. > Motivation: Currently, in a normal query+highlighting request, the > highlighter re-uses the Query object parsed by the QueryComponent. This can > result in incorrect highlights if the field being highlighted is of a > different type than the field being queried. In my particular case: > * My queries don't explicitly specify field names; they always rely on the > default field > * My default field for search is "body" > * body is a unigram-plus-bigram field. So, e.g. input "audit trail" gets > turned into tokens "audit / audit trail / trail". (This is a performance > optimzation.) > * If I try to highlight directly on "body", the highlights get screwed up. > (This is because the highlighter doesn't really support the kind of > "continuously overlapping" tokens generated by my analysis chain. In short, > the bigrams confuse the TokenGroup class.) > * To avoid these highlighting problems, I don't directly highlight "body", > but rather a "highlight" field, which has no bigram tokens. ("highlight" is > populated from "body" with a copyfield directive.) > * Without hl.df, I have a new class of highlighting problems. In particular, > if the user enters a phrase search (e.g. "audit trail"), then that phrase > appears unhighlighted in the highlighter output. The short version for why is > that the analyzer used to parse the query output a Query object that contains > bigrams, but the text that we're highlighting doesn't contain bigrams. > * With hl.df, the analyzers match up for highlight; the Query object used > for highlighting does _not_ contain bigrams, just like the "highlight" field. > (I realize it may help to expand the description of this use case, but I'm a > bit hurried right now.) > I wanted to throw this out there, partly in case people have any better > solutions. One variation on hl.df option that might be worth considering is > hl.UseHighlightedFieldAsDefaultField, which would create a new Query object > not just once at the start of highlighting, but separately for each > particular field that's getting highlighted. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org