[
https://issues.apache.org/jira/browse/SOLR-2891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Robert Muir reassigned SOLR-2891:
---------------------------------
Assignee: Robert Muir
> InvalidTokenOffsetsException when using MappingCharFilterFactory,
> DictionaryCompoundWordTokenFilterFactory and Highlighting
> ---------------------------------------------------------------------------------------------------------------------------
>
> Key: SOLR-2891
> URL: https://issues.apache.org/jira/browse/SOLR-2891
> Project: Solr
> Issue Type: Bug
> Components: highlighter, Schema and Analysis, search
> Affects Versions: 3.1, 3.4
> Environment: MacOS X, Java 1.6, Tomcat 7
> Reporter: Edwin Steiner
> Assignee: Robert Muir
> Priority: Critical
>
> I would like to handle german accents (Umlaute) by replacing the accented
> char with its two-letter substitute (e.g ä => ae). For this reason I use the
> char-filter solr.MappingCharFilterFactory configured with a mapping file
> containing entries like "ä" => "ae". I also want to use the
> solr.DictionaryCompoundWordTokenFilterFactory to find words which are part of
> compound words (e.g. revision in totalrevision). And finally I want to use
> Solr highlighting. But there seems to be a problem if I combine the char
> filter and the compound word filter in combination with highlighting (an
> org.apache.lucene.search.highlight.InvalidTokenOffsetsException is raised).
> Here are the details:
> types:
> --------
> <fieldType name="textAnalyzedFailed" class="solr.TextField"
> positionIncrementGap="100">
> <analyzer>
> <charFilter class="solr.MappingCharFilterFactory"
> mapping="mapping.txt"/>
> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> <filter class="solr.DictionaryCompoundWordTokenFilterFactory"
> dictionary="words.txt"/>
> </analyzer>
> </fieldType>
> schema:
> -----------
> <fields>
> <field name="id" type="string" indexed="true"
> stored="true" required="true" />
> <field name="title" type="textAnalyzedFailed" indexed="true"
> stored="true"/>
> </fields>
> document:
> --------------
> <doc>
> <field name="id">1</field>
> <field name="title">banküberfall</field>
> </doc>
> mapping.txt:
> -----------------
> "ü" => "ue"
> words.txt:
> --------------
> fall
> The resulting error when search with:
> http://localhost:8080/solr/select/?q=banküberfall&hl=true&hl.fl=title
> Nov 4, 2011 4:29:12 PM org.apache.solr.core.SolrCore execute
> INFO: [] webapp=/solr path=/select/
> params={q=bank?berfall&hl.fl=title_hl&hl=true} hits=1 status=0 QTime=13
> Nov 4, 2011 4:29:16 PM org.apache.solr.common.SolrException log
> SEVERE: org.apache.solr.common.SolrException:
> org.apache.lucene.search.highlight.InvalidTokenOffsetsException: Token fall
> exceeds length of provided text sized 12
> at
> org.apache.solr.highlight.DefaultSolrHighlighter.doHighlightingByHighlighter(DefaultSolrHighlighter.java:469)
> at
> org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:378)
> at
> org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:116)
> at
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194)
> at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1360)
> at
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
> at
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
> at
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
> at
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:224)
> at
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175)
> at
> org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:462)
> at
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:164)
> at
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:100)
> at
> org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:851)
> at
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
> at
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:405)
> at
> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:278)
> at
> org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:515)
> at
> org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:302)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:680)
> Caused by: org.apache.lucene.search.highlight.InvalidTokenOffsetsException:
> Token fall exceeds length of provided text sized 12
> at
> org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:228)
> at
> org.apache.solr.highlight.DefaultSolrHighlighter.doHighlightingByHighlighter(DefaultSolrHighlighter.java:462)
> ... 23 more
> The analysis tool says the following for field name=title, field
> value=banküberfall:
> ------------------------------------------------------------------------------------
> Index Analyzer
> org.apache.solr.analysis.MappingCharFilterFactory {mapping=mapping.txt,
> luceneMatchVersion=LUCENE_31}
> text bankueberfall
> org.apache.solr.analysis.WhitespaceTokenizerFactory
> {luceneMatchVersion=LUCENE_31}
> position 1
> term text bankueberfall
> startOffset 0
> endOffset 12
> org.apache.solr.analysis.DictionaryCompoundWordTokenFilterFactory
> {dictionary=words.txt, luceneMatchVersion=LUCENE_31}
> position 1
> term text bankueberfall
> fall
> startOffset 0
> 9
> endOffset 12
> 13
> flags 0
> 0
> type word
> word
> payload
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]