[ http://issues.apache.org/jira/browse/LUCENE-663?page=comments#action_12429848 ] Karel Tejnora commented on LUCENE-663: --------------------------------------
Hi, yes as I wrote in the code and keeps author - I borrow small code parts from this contribution http://issues.apache.org/jira/browse/LUCENE-644?page=all (where is a small bug when term is on or near to end of field - change lines 321:sb.append(cbuf, 0, EOF ? skip : (surround - skippedChars)); 276:int readed = reader.read(cbuf, 0, nextStart - pos); 278:sb.append(cbuf,0,readed); also from WildcardTermEnum. Motivation - I was unable to find a highlighter with good performance and proper phrase highlight (at beginning I needed just phrase with slop 0). This highlighter results highlight for query "karel drinks beer"~4 on text karel drinks a lot of beers. Beer is his life. -> <SUFFIX>karel</SUFFIX> <PREFIX>drinks<SUFFIX> a lot of czech <PREFIX>beer</SUFFIX>. Beer is his life. I started to implement a stack for phrase query - end up with this. Still it is not final, fuzzy, span,scoring and coloring needs to be done. I mean 'Coloring': <PREFIX>karel</SUFFIX> <PREFIX>drinks<SUFFIX> a <PREFIX1>lot</SUFFIX1> of <PREFIX1>czech</SUFFIX1> <PREFIX>beer</SUFFIX>. Beer is his life. for wild card BMW* -> <PREFIX>BMW</SUFFIX><PREFIX>ED</SUFFIX1> etc. So user can see why document matches his query. Usage is maybe more straightforward: Constructs Highlighter where all passed fields will be highlighted using TermPositionVector (where is not tpv null is returned) FulltextHighlighter highlighter = new FulltextHighlighter(reader,query,prefix,suffix); OR Constructs Highlighter where all fields with highlight will be highlighted using Analyzer FulltextHighlighter highlighter = new FulltextHighlighter(analyzer,query,prefix,suffix); Constructs Highlighter where analyzer or TermVector will be autodetected FulltextHighlighter highlighter = new FulltextHighlighter(reader, analyzer,query,prefix,suffix); And when iterating hits: String higlightedText = highlighter.highlight(luceneDocumentID, luceneDocument, fieldName); // To use tpv OR String higlightedText = highlighter.highlight(luceneDocument, fieldName); // To use analyzer, if tpv usage is forced assert reacts it has some options: setAnalyzerUnstable(boolean analyzerUnstable) set it false (default true) if you know that Token t(n).startOffset() < t(n+1).startOffset setMaxFragments(int i); max fragmets setSurround(int surround); a) b) I don't know maybe it will be faster or lighter or none from both but I began because none from contributed and issued give 'nice' results. Im using a lot queries to search names like "James Bond" OR "Sean Connery" a this gives me nicer view why the document matches my query. :-) Or I don't know how to use google > New feature rich higlighter for Lucene. > --------------------------------------- > > Key: LUCENE-663 > URL: http://issues.apache.org/jira/browse/LUCENE-663 > Project: Lucene - Java > Issue Type: New Feature > Components: Search > Reporter: Karel Tejnora > Attachments: lucene-hlt-src.jar > > > Well, I refactored (took) some code from two previous highlighters. > This highlighter: > + use TermPositionVector where available > + use Analyzer if no TermPositionVector found or is forced to use it. > + support for all lucene queries (Term, Phrase with slops, Prefix, Wildcard, > Range) except Fuzzy Query (can be implemented easly) > - has no support for scoring (yet) > - use same prefix,postfix for accepted terms (yet) > ? It's written in Java5 > In next release I'd like to add support for Fuzzy, "coloring" f.e. diffrent > color for terms btw. phrase terms (slops), scoring of fragments > It's apache licensed - I hope so :-) I put licene statement in every file -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]