You're right, but it's fixed now. In Solr's analysis page i spotted an incorrect endOffset for some tokens. After correcting the endOffset the error no longer appears. I should more carefully check the offsets emitted.
Thanks for your time anyway :) -----Original message----- > From:Thomas Matthijs <li...@selckin.be> > Sent: Thu 04-Oct-2012 15:55 > To: java-user@lucene.apache.org > Subject: Re: Highlighter IOOBE with modified > HyphenationCompoundWordTokenFilter > > And to include the code > > On Thu, Oct 4, 2012 at 3:52 PM, Markus Jelsma > <markus.jel...@openindex.io> wrote: > > I forgot to add that this is with today's build of trunk. > > > > -----Original message----- > >> From:Markus Jelsma <markus.jel...@openindex.io> > >> Sent: Thu 04-Oct-2012 15:42 > >> To: java-user@lucene.apache.org > >> Subject: Highlighter IOOBE with modified HyphenationCompoundWordTokenFilter > >> > >> Hi, > >> > >> I've modified the HyphenationCompoundWordTokenFilter to emit less > >> subtokens because the original filter can emit all kinds of subtokens that > >> have a very different meaning on their own. I've modified it so no > >> overlapping subtokens are emitted and no subtokens are emitted that can be > >> found within another subtoken. I've also modified it to force that the > >> generated subtokens comprise the original token and if they don't forget > >> the subtokens. It also doesn't return the original token anymore, the > >> original filter produces a duplicate of the original input token. For > >> example: verzekeringmaatschappij now becomes verzekering and maatschappij > >> and not verzekeringmaatschappij, ver, zeker, verzeker, zekering, ringmaat, > >> maat and more. > >> > >> But it seem that i have done something wrong because my modified version > >> sometimes causes the Highlighter to throw the following IOOBE: > >> > >> java.lang.StringIndexOutOfBoundsException: String index out of range: -14 > >> at java.lang.String.substring(String.java:1937) > >> at > >> org.apache.lucene.search.vectorhighlight.BaseFragmentsBuilder.makeFragment(BaseFragmentsBuilder.java:172) > >> at > >> org.apache.lucene.search.vectorhighlight.BaseFragmentsBuilder.createFragments(BaseFragmentsBuilder.java:138) > >> at > >> org.apache.lucene.search.vectorhighlight.FastVectorHighlighter.getBestFragments(FastVectorHighlighter.java:186) > >> at > >> org.apache.solr.highlight.DefaultSolrHighlighter.doHighlightingByFastVectorHighlighter(DefaultSolrHighlighter.java:571) > >> at > >> org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:401) > >> at > >> org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:136) > >> at > >> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:214) > >> at > >> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) > >> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1750) > >> at > >> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:455) > >> at > >> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:276) > >> at > >> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337) > >> ..... > >> > >> Anyone to point me in the right direction? I've checked the LIA book on > >> how to manipulate the tokenstream and thought it should be alright. My > >> analysis tests also yield good results, nothing strange to be found. Or > >> could it be an error in the highlighter that only now shows up? > >> > >> Thanks, > >> Markus > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > >> For additional commands, e-mail: java-user-h...@lucene.apache.org > >> > >> > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org