On Wed, 2008-04-23 at 16:15 -0400, David Kaelbling wrote: > Hi, > > I've been using the 2.3.1 contrib highlighter with the 2/10/2008 > SpanHighlighter patch, and have run into some trouble. If I have two > phrases in a query that share terms (e.g. "hello world" and "hello > goodbye") the SpanScorer seems to not highlight 'hello' consistently. > > It looks to me like WeightedSpanTermExtractor.extract() is clobbering > the span positions for 'hello' the second time it encounters the term. > Should terms.putAll(booleanTerms) and terms.putAll(disjunctTerms) really > be replacing the old entry, or should the try to addPositionSpans()? > > Thanks, > David > > PS: And while I'm asking, it looks like getWeightedSpanTermsWithScores() > will wrap the cachingTokenFilter passed it by SpanScorer.init() into > another CachingTokenFilter, duplicating the cache? >
Hmmm...reminds me of an early dev bug I thought I added a test case for and fixed. I will take a look as soon as I can. - mark --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]