[ https://issues.apache.org/jira/browse/SOLR-4686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13627548#comment-13627548 ]
Holger Floerke commented on SOLR-4686: -------------------------------------- Hi Steve, thanks for your quick comments. """ My surface reading of Highlighter and Formatter classes makes me think that there is no natural plugin point right now for an HTML-aware boundary insertion mechanism. """ Are you think, the highlighter/formatter has a problem, or the offsets of the HTMLStripCharFilter are the problem? This question may be philosophical, but in my opition the HTMLStripCharFilter is resposible to write the correct offsets. This isn't easy, because the filter has to "understand" the structure, modifying start-positions and end-positions in certain cases and so on, but I see problems are growing when more people are doing XHTML output with highlighter. In my case, I use HTMLStripCharFilter to normalize XML-Input, therefor I would be happy about a switch "do not treat inline elements". > HTMLStripCharFilter and Highlighter generates invalid HTML > ---------------------------------------------------------- > > Key: SOLR-4686 > URL: https://issues.apache.org/jira/browse/SOLR-4686 > Project: Solr > Issue Type: Bug > Components: highlighter > Affects Versions: 4.1 > Reporter: Holger Floerke > Labels: HTML, highlighter > > Using the HTMLStripCharFilter may yield to an invalid HTML highlight. > The HTMLStripCharFilter has a special treatment of inline-elements (eg. "a", > "b", ...). For theese elements the CharFilter ignores the tag and does not > insert any split-character. > If you index > """ > <a>xxx</a> > """ > you get the word "xxx" starting at position 3 ending on position 10(!) > If you highlight a search on "xxx", you will get > """ > <a><em>xxx</a></em> > """ > which is invalid HTML. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org