Hi, I've been using the 2.3.1 contrib highlighter with the 2/10/2008 SpanHighlighter patch, and have run into some trouble. If I have two phrases in a query that share terms (e.g. "hello world" and "hello goodbye") the SpanScorer seems to not highlight 'hello' consistently.
It looks to me like WeightedSpanTermExtractor.extract() is clobbering the span positions for 'hello' the second time it encounters the term. Should terms.putAll(booleanTerms) and terms.putAll(disjunctTerms) really be replacing the old entry, or should the try to addPositionSpans()? Thanks, David PS: And while I'm asking, it looks like getWeightedSpanTermsWithScores() will wrap the cachingTokenFilter passed it by SpanScorer.init() into another CachingTokenFilter, duplicating the cache? -- David Kaelbling Senior Software Engineer Black Duck Software, Inc. [EMAIL PROTECTED] T +1.781.810.2041 F +1.781.891.5145 http://www.blackducksoftware.com --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]