[jira] Commented: (LUCENE-663) New feature rich higlighter for Lucene.

Karel Tejnora (JIRA) Tue, 22 Aug 2006 17:10:19 -0700

    [ 
http://issues.apache.org/jira/browse/LUCENE-663?page=comments#action_12429848 ] 
            
Karel Tejnora commented on LUCENE-663:
--------------------------------------


Hi,
yes as I  wrote in the code and keeps author - I borrow small code parts from 
this contribution http://issues.apache.org/jira/browse/LUCENE-644?page=all 
(where is a small bug when term is on or near to end of field - change lines 
321:sb.append(cbuf, 0, EOF ? skip : (surround - skippedChars));  
276:int readed = reader.read(cbuf, 0, nextStart - pos); 
278:sb.append(cbuf,0,readed);
also from WildcardTermEnum.

Motivation - I was unable to find a highlighter with good performance and 
proper phrase highlight (at beginning I needed just phrase with slop 0).

This highlighter results highlight for query "karel drinks beer"~4 on text 
karel drinks a lot of beers. Beer is his life. -> <SUFFIX>karel</SUFFIX> 
<PREFIX>drinks<SUFFIX> a lot of  czech <PREFIX>beer</SUFFIX>. Beer is his life.

I started to implement a stack for phrase query - end up with this.  Still it 
is not final, fuzzy, span,scoring and coloring needs to be done.
I mean 'Coloring':
<PREFIX>karel</SUFFIX> <PREFIX>drinks<SUFFIX> a <PREFIX1>lot</SUFFIX1> of  
<PREFIX1>czech</SUFFIX1> <PREFIX>beer</SUFFIX>. Beer is his life.

for wild card BMW* -> <PREFIX>BMW</SUFFIX><PREFIX>ED</SUFFIX1>
etc.

So user can see why document matches his query.

Usage is maybe more straightforward:

Constructs Highlighter where all passed fields will be highlighted using 
TermPositionVector (where is not tpv null is returned)

FulltextHighlighter highlighter = new 
FulltextHighlighter(reader,query,prefix,suffix);

OR 
Constructs Highlighter where all fields with highlight will be highlighted 
using Analyzer

FulltextHighlighter highlighter = new 
FulltextHighlighter(analyzer,query,prefix,suffix);

Constructs Highlighter where analyzer or TermVector will be autodetected
FulltextHighlighter highlighter = new FulltextHighlighter(reader, 
analyzer,query,prefix,suffix);

And when iterating hits:
String higlightedText = highlighter.highlight(luceneDocumentID, luceneDocument, 
fieldName);  // To use tpv

OR
String higlightedText = highlighter.highlight(luceneDocument, fieldName);  // 
To use analyzer, if tpv usage is forced assert reacts

it has some options:
setAnalyzerUnstable(boolean analyzerUnstable)  set it false (default true) if 
you know that Token t(n).startOffset() < t(n+1).startOffset
setMaxFragments(int i); max fragmets
setSurround(int surround);

a) b) I don't know maybe it will be faster or lighter or none from both but I 
began because none from contributed and issued give 'nice' results.
Im using a lot queries to search names like "James Bond" OR "Sean Connery" a 
this gives me nicer view why the document matches my query.

:-) Or I don't know how to use google

> New feature rich higlighter for Lucene.
> ---------------------------------------
>
>                 Key: LUCENE-663
>                 URL: http://issues.apache.org/jira/browse/LUCENE-663
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Search
>            Reporter: Karel Tejnora
>         Attachments: lucene-hlt-src.jar
>
>
> Well, I refactored (took) some code from two previous highlighters.
> This highlighter:
> + use TermPositionVector where available
> + use Analyzer if no TermPositionVector found or is forced to use it.
> + support for all lucene queries (Term, Phrase with slops, Prefix, Wildcard, 
> Range) except Fuzzy Query (can be implemented easly)
> - has no support for scoring (yet)
> - use same prefix,postfix for accepted terms (yet)
> ? It's written in Java5
> In next release I'd like to add support for Fuzzy, "coloring" f.e. diffrent 
> color for terms btw. phrase terms (slops), scoring of fragments
> It's apache licensed - I hope so :-) I put licene statement in every file

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-663) New feature rich higlighter for Lucene.

Reply via email to