Did you mean Xml*Strip*CharFilter? koji -- http://www.rondhuit.com/en/
(11/06/15 22:12), Mike Sokolov (JIRA) wrote:
XmlCharFilter ------------- Key: SOLR-2597 URL: https://issues.apache.org/jira/browse/SOLR-2597 Project: Solr Issue Type: Improvement Components: Schema and Analysis Affects Versions: 4.0 Reporter: Mike Sokolov This CharFilter processes incoming XML using the Woodstox parser, stripping all non-text content and remembering offsets, just like HTMLCharFilter, but respecting XML conventions like XML entities defined in a DTD. XmlCharFilter also provides the ability to exclude (and include) the content of certain named elements. In order to compute character offsets properly when mixed line termination styles are present (\r, \r\n), or when XML character entities (<,",&) are present, we require a newer version of Woodstox (4.1.1) than is currently in solr/lib. The earlier versions of the parser could not report these entity events, so we couldn't tell the difference between "<" and"<" and the offsets could be wrong. The upgraded version is in the patch. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
--------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org