After much head-scratching and re-reading of posts around this issue, I found a solution by writing my own QueryTermExtractor. Thanks for the help everyone.
To recap, I wanted to get more "precise" highlighting by having just the "right" fields highlighted. My original example was a field named "metadata" which is the concatenation of some metadata about the document (author, title, etc.) This field is indexed and not stored. Users are able to run queries such as "metadata:myauthor", and expect the "title" field to be highlighted in the search results when it contains "myauthor". With the stock highlighter, I had 2 choices: a- highlight all occurrences of "myauthor" in all the fields (not good) or b- highlight occurrences of "myauthor" in just the "metadata" field. This field doesn't exist per se in the document, and no highlighting was produced at all. Not good either. Basically I needed to map "index-only" fields (indexed, not stored) to the actual (stored) fields that make up this field. Mark gave my the key by mentioning the QueryTermExtractor just produces a list of strings. I wrote a custom QueryTermExtractor (copy, paste, hack) that has knowledge of my indexing schema and knows how to map index-only fields to actual fields, therefore generating the right set of WeightedTerm[] passed to the QueryScorer constructor. I hope this will make sense to the (future) reader who ran into this problem also. --Renaud -----Original Message----- From: mark harwood [mailto:[EMAIL PROTECTED] Sent: Monday, March 05, 2007 2:03 AM To: java-user@lucene.apache.org Subject: Re: More Precise Highlighting I think the solution is fairly simple. Pass the "metadata" fieldname to the QueryTermExtractor - not the fieldname "author". QueryTermExtractor effectively provides just a list of strings (no fieldnames) which are then matched against strings found in the tokenStream which represents your content. Because your query did not contain any "author:apple" clauses QueryTermExtractor produced no strings for highlighting when you asked for the field. Cheers Mark On 3/2/07, Renaud Waldura <[EMAIL PROTECTED]> wrote: > > Hello Mark: > > I apologize for not responding earlier, more urgent stuff took over. I > really appreciate your help with this. Since my first message was > somewhat terse, let me explain again. > > My index includes both "regular" fields (stored, indexed, tokenized) > and "search-only" fields used for searching only (unstored, indexed, > tokenized). > In particular, I have one such field called "metadata" that's the > accumulation of all metadata about the document (title, authors, date, > etc.). Using this field, I can easily query "all metadata" without > searching the document text (that's a separate field). It has no > meaning other than making this kind of query possible. > > Say I query "+metadata:apple +banana". I would like to have the term > "apple" > highlighted in all metadata fields; if Fiona Apple authored a > document, her lastname should be highlighted in the "author" field, > but not in the document text. > > From what I've experienced, when I pass NULL as fieldname to the > QueryTermExtractor, "apple" gets highlighted everywhere, both in any > metadata AND the document text. When I pass "author" as fieldname, it > doesn't highlight that field at all -- because the query was actually > for "metadata" not "author". It's not correct in either case, and I'm > not sure what to do. > > Thanks again for your help, > > --Renaud > > > > > -----Original Message----- > From: markharw00d [mailto:[EMAIL PROTECTED] > Sent: Tuesday, February 13, 2007 3:24 PM > To: java-user@lucene.apache.org > Subject: Re: More Precise Highlighting > > Not sure I fully understand the problem. The query is effectively > "allContent:someTitleText" and you want to highlight the string > "someTitleText" in the title field? > If you pass null as a fieldname to the QueryTermExtractor it will use > all term values, regardless of field, as string to highlight. > If you pass a fieldname it will only select highlight term values for > that field. > If you want, you can use QueryTermExtractor to extract just the > "allContent" > field values and pass a TokenStream for the "title" field to the > highlighter and it would highlight the appropriate values in the > title. > > Do any of these options work? > > > > Renaud Waldura wrote: > > The old highlighter code used to highlight found terms in any field > > (too broad). The new highlighter lets one specify a field when > > highlighting, but it highlights that field only (too narrow). > > > > In my case we have an "all" field that is the concatenation of all > > data about the document. When I highlight e.g. the "title" field, > > nothing happens, because the Highlighter doesn't know the title is > > included in this "all" field. > > > > How can I tell the highlighter that my query fields can map to some > > document fields? It looks like I'd change the QueryTermExtractor, > > but it's conveniently all-static and final. > > > > --Renaud > > > > > > > > > > > > > ___________________________________________________________ > All new Yahoo! Mail "The new Interface is stunning in its simplicity > and ease of use." - PC Magazine > http://uk.docs.yahoo.com/nowyoucan.html > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > ___________________________________________________________ New Yahoo! Mail is the ultimate force in competitive emailing. Find out more at the Yahoo! Mail Championships. Plus: play games and win prizes. http://uk.rd.yahoo.com/evt=44106/*http://mail.yahoo.net/uk --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]