[jira] Commented: (LUCENE-794) Beginnings of a span based highlighter

Mark Harwood (JIRA) Mon, 05 Feb 2007 11:44:26 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12470327
 ]


Mark Harwood commented on LUCENE-794:
-------------------------------------

>>Sorry about all that Mark H
No need for any apologies - all help is gratefully received!
I don't mean to criticise your efforts or seem picky - I just wanted to record 
my findings somewhere useful if we were to consider working a solution up from 
this "test code" rather than tweaking the current highlighter - I'm still 
uncertain about the best approach. I also thought it might be useful to point 
the potential issues out to you if you were already reliant on using this code 
somewhere.

>>I need to read the TokenStream at least twice
>>I used the horribly hackey but quick-for-me method of adding a method to 
>>MemoryIndex that accepts a List of Tokens. Any ideas? 

I'm not sure about modifying MemoryIndex. It should be easy enough to create a 
subclass of TokenStream - ("CachedTokenStream" perhaps?) which takes a real 
TokenStream in it's constructor and delegates all "next" calls to it (and also 
records them in a List) for the the first use. This can then be "rewound" and 
re-used to run through the same set of tokens held in the list  from the first 
run.


>>if position increment equals 0 skip printing out the token...but I am not 
>>totally confident it is perfect yet. 

I think it's possible some of the more Byzantine analyzers may have a position 
increment >0 but overlap in terms of their byte offsets. I'd need to check the 
old Junit tests to be sure on this. Welcome to my hell!

Thanks again for your help.
Mark H

> Beginnings of a span based highlighter
> --------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: DefaultEncoder.java, Encoder.java, Formatter.java, 
> Highlighter.java, Highlighter.java, HighlighterTest.java, 
> HighlighterTest.java, MemoryIndex.java, QuerySpansExtractor.java, 
> SimpleFormatter.java
>
>
> This is some test code to start the work of adding a span based highlighting 
> approach to the existing highlighter in contrib. See 
> http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-794) Beginnings of a span based highlighter

Reply via email to