You're right, but it's fixed now. In Solr's analysis page i spotted an 
incorrect endOffset for some tokens. After correcting the endOffset the error 
no longer appears. I should more carefully check the offsets emitted.

Thanks for your time anyway :) 
 
-----Original message-----
> From:Thomas Matthijs <li...@selckin.be>
> Sent: Thu 04-Oct-2012 15:55
> To: java-user@lucene.apache.org
> Subject: Re: Highlighter IOOBE with modified 
> HyphenationCompoundWordTokenFilter
> 
> And to include the code
> 
> On Thu, Oct 4, 2012 at 3:52 PM, Markus Jelsma
> <markus.jel...@openindex.io> wrote:
> > I forgot to add that this is with today's build of trunk.
> >
> > -----Original message-----
> >> From:Markus Jelsma <markus.jel...@openindex.io>
> >> Sent: Thu 04-Oct-2012 15:42
> >> To: java-user@lucene.apache.org
> >> Subject: Highlighter IOOBE with modified HyphenationCompoundWordTokenFilter
> >>
> >> Hi,
> >>
> >> I've modified the HyphenationCompoundWordTokenFilter to emit less 
> >> subtokens because the original filter can emit all kinds of subtokens that 
> >> have a very different meaning on their own. I've modified it so no 
> >> overlapping subtokens are emitted and no subtokens are emitted that can be 
> >> found within another subtoken. I've also modified it to force that the 
> >> generated subtokens comprise the original token and if they don't forget 
> >> the subtokens. It also doesn't return the original token anymore, the 
> >> original filter produces a duplicate of the original input token. For 
> >> example: verzekeringmaatschappij now becomes verzekering and maatschappij 
> >> and not verzekeringmaatschappij, ver, zeker, verzeker, zekering, ringmaat, 
> >> maat and more.
> >>
> >> But it seem that i have done something wrong because my modified version 
> >> sometimes causes the Highlighter to throw the following IOOBE:
> >>
> >> java.lang.StringIndexOutOfBoundsException: String index out of range: -14
> >>         at java.lang.String.substring(String.java:1937)
> >>         at 
> >> org.apache.lucene.search.vectorhighlight.BaseFragmentsBuilder.makeFragment(BaseFragmentsBuilder.java:172)
> >>         at 
> >> org.apache.lucene.search.vectorhighlight.BaseFragmentsBuilder.createFragments(BaseFragmentsBuilder.java:138)
> >>         at 
> >> org.apache.lucene.search.vectorhighlight.FastVectorHighlighter.getBestFragments(FastVectorHighlighter.java:186)
> >>         at 
> >> org.apache.solr.highlight.DefaultSolrHighlighter.doHighlightingByFastVectorHighlighter(DefaultSolrHighlighter.java:571)
> >>         at 
> >> org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:401)
> >>         at 
> >> org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:136)
> >>         at 
> >> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:214)
> >>         at 
> >> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
> >>         at org.apache.solr.core.SolrCore.execute(SolrCore.java:1750)
> >>         at 
> >> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:455)
> >>         at 
> >> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:276)
> >>         at 
> >> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337)
> >>         .....
> >>
> >> Anyone to point me in the right direction? I've checked the LIA book on 
> >> how to manipulate the tokenstream and thought it should be alright. My 
> >> analysis tests also yield good results, nothing strange to be found. Or 
> >> could it be an error in the highlighter that only now shows up?
> >>
> >> Thanks,
> >> Markus
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> >> For additional commands, e-mail: java-user-h...@lucene.apache.org
> >>
> >>
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: java-user-h...@lucene.apache.org
> >
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to