[
https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12470327
]
Mark Harwood commented on LUCENE-794:
-------------------------------------
>>Sorry about all that Mark H
No need for any apologies - all help is gratefully received!
I don't mean to criticise your efforts or seem picky - I just wanted to record
my findings somewhere useful if we were to consider working a solution up from
this "test code" rather than tweaking the current highlighter - I'm still
uncertain about the best approach. I also thought it might be useful to point
the potential issues out to you if you were already reliant on using this code
somewhere.
>>I need to read the TokenStream at least twice
>>I used the horribly hackey but quick-for-me method of adding a method to
>>MemoryIndex that accepts a List of Tokens. Any ideas?
I'm not sure about modifying MemoryIndex. It should be easy enough to create a
subclass of TokenStream - ("CachedTokenStream" perhaps?) which takes a real
TokenStream in it's constructor and delegates all "next" calls to it (and also
records them in a List) for the the first use. This can then be "rewound" and
re-used to run through the same set of tokens held in the list from the first
run.
>>if position increment equals 0 skip printing out the token...but I am not
>>totally confident it is perfect yet.
I think it's possible some of the more Byzantine analyzers may have a position
increment >0 but overlap in terms of their byte offsets. I'd need to check the
old Junit tests to be sure on this. Welcome to my hell!
Thanks again for your help.
Mark H
> Beginnings of a span based highlighter
> --------------------------------------
>
> Key: LUCENE-794
> URL: https://issues.apache.org/jira/browse/LUCENE-794
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Other
> Reporter: Mark Miller
> Priority: Minor
> Attachments: DefaultEncoder.java, Encoder.java, Formatter.java,
> Highlighter.java, Highlighter.java, HighlighterTest.java,
> HighlighterTest.java, MemoryIndex.java, QuerySpansExtractor.java,
> SimpleFormatter.java
>
>
> This is some test code to start the work of adding a span based highlighting
> approach to the existing highlighter in contrib. See
> http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]