[EMAIL PROTECTED] wrote:
I think this version of the highlighter should provide a fix: http://www.inperspective.com/lucene/hilite2beta.zip
Before I update the version of the highlighter in the sandbox I'd appreciate feedback from those troubled with the issues to do with overlapping tokens in token streams (Erik, Dave, Bruce?)

1st pass of testing - yes, this does indeed fix the problem.
I've realized I may want to modify my Analyer now too.
I was focusing on the Token position increment instead of the offset.
For something like the case where I broken "HashMap" into 3 tokens: "Hash", "Map", "HashMap", I was returning the same start/end offsets for all of them (thus a search on "Map" ends up with all of "HashMap" being highlighted). Probably more correct is to return offsets within the orig larger token so that you can see exactly where your term matched. I'll update my code and then put up a site that demonstrates this.


thx,
 Dave



I added my own test analyzer to the Junit test that introduces synonyms into the token stream at the same position as the trigger token and the new code works OK for me with that analyzer.

The fix means I needed to change the Formatter interface - this now takes a "TokenGroup" object instead of a token because that can be used to represent a single token OR a sequence of overlapping tokens.
I dont think most people have needed to create custom Formatter implementations so I dont think this
redefined interface should break too much existing code (if any).


Cheers
Mark


--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to