[ 
https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12611453#action_12611453
 ] 

Tavi Nathanson commented on LUCENE-794:
---------------------------------------

Hey everyone,

I'm having some trouble getting SpanScorer to act the way I'd like for proper 
highlighting, and I'm wondering if anyone has any suggestions.

I have two fields: text_raw and text_stemmed. text_raw, as the name suggests, 
stores unstemmed (tokenized) text while text_stemmed stores stemmed (tokenized) 
text.

I have queries that look over both fields. For, example, I may have the query 
+(text_raw:"apple sauce" text_stemmed:orange). This query matches "apple sauce 
oranges" but it does not match "apples sauces orange" (because "apple sauce" is 
not stemmed). I'd like to be able to highlight accordingly: I want "apple," 
"sauce," and "oranges" to all be highlighted.

So, even though it is in fact the raw text that ends up getting highlighted, 
I'm looking for a way to build SpanScorer such that I don't need to limit 
myself to one field ("field" is one of the arguments to the constructor).

Thanks!

Tavi


> Extend contrib Highlighter to properly support PhraseQuery, SpanQuery,  
> ConstantScoreRangeQuery
> -----------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>             Fix For: 2.3.2
>
>         Attachments: MultiPhraseQueryExtraction.patch, 
> SpanHighlighter-01-26-2008.patch, SpanHighlighter-01-28-2008.patch, 
> SpanHighlighter-02-10-2008.patch, SpanHighlighter-RemovSysOut.patch, 
> spanhighlighter.patch, spanhighlighter10.patch, spanhighlighter11.patch, 
> spanhighlighter12.patch, spanhighlighter2.patch, spanhighlighter3.patch, 
> spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter7.patch, 
> spanhighlighter8.patch, spanhighlighter9.patch, 
> spanhighlighter_24_January_2008.patch, spanhighlighter_patch_4.zip
>
>
> This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter 
> package that scores just like QueryScorer, but scores a 0 for Terms that did 
> not cause the Query hit. This gives 'actual' hit highlighting for the range 
> of SpanQuerys, PhraseQuery, and  ConstantScoreRangeQuery. New Query types are 
> easy to add. There is also a new Fragmenter that attempts to fragment without 
> breaking up Spans.
> See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to