On Tuesday 04 February 2003 03:17, Massimo Mannino wrote: > From: Stone, Timothy > Subject: Schreiber's TermHighlighter/LuceneTools utility and the > Demo... > Date: Wed, 11 Dec 2002 13:00:28 -0800 > > Has anyone successfully implemented Mark Schreiber's TermHighlighter > interface in the > demo? I'm seeking some pointers/samples.
I was thinking of starting to implement it at some point, but haven't yet done it. However, reading through the sample code, I thought there are some things that would be useful to add to Lucene Term class(es). Most importantly, example traverses query hierarchy to get to terms used, by doing explicit casts and checks. This seems like a good place to use the classic visitor pattern instead. So, Query.java should either have abstract methods (to be implemented by deriving classes) to implement visitor pattern (if I recall, something like "visitTerms(TermVisitor)" to start traversing, and visit calls a method in TermVisitor for each term), or alternatively (additionally), methods for getting Terms and sub-queries contained. [my apologies if I description of visitor pattern is bit off... been a while since used it... but the main idea should be about right] This would have to be included in core Lucene classes, but I think that would be a good (and simple to add) addition, to allow easily collecting all Terms and/or sub-Queries given full query has. After getting all terms, tokenizing should be easy, and one can create a data structure listing all spans that need highlighting. Then this has to be applied to the original un-tokenized content. Structure can (for example) just contain character ranges along with type of highlighting (different colours). There are some potential tricky parts if HTML is to be highlighted; highlight markers still need to nest properly (if terms are not fully contained in same tag level), and some highlighted terms may not be visible (to resolve this, one needs to compare original text and analyzed text, to see which parts are removed by analyzer, but that match term(s)). But even without getting it 100% correct, output should be fairly close. -+ Tatu +- --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
