[
https://issues.apache.org/jira/browse/LUCENE-3690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184983#comment-13184983
]
Steven Rowe commented on LUCENE-3690:
-------------------------------------
{quote}
bq. This would be the first back-compat enabled CharFilter.
What would be the motivation?
{quote}
There are differences in the behavior, but I guess all of these could be
characterized as bug fixes:
# Supplementary characters in tags will be recognized. The old version doesn't
do this.
# CDATA sections are recognized. The old version doesn't; people have
requested this, e.g.
[http://www.lucidimagination.com/search/document/48fcd906e39764ec#48fcd906e39764ec])
# No space is substituted for inline tags (e.g. {{<b>}}, {{<i>}}, {{<span>}}).
The old version substitutes spaces for all tags; people have complained e.g.
[on
SOLR-1343|https://issues.apache.org/jira/browse/SOLR-1343?focusedCommentId=13096839&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13096839]
# Broken MS-Word-generated processing instructions {{<? ... />}} will be
handled.
# Uppercase character entities "quot", "copy", "gt", "lt", "reg", and "amp"
will be recognized (from Dawid Weiss's SOLR-882 patch); the old version doesn't
do this.
bq. Are there some features of the previous one that don't make sense in this
implementation?
No, not as far as I can tell. I think all features of the previous one are
included.
bq. Also I don't know how Version etc would work here, since the old
HtmlStripCharFilter was never part of lucene. from lucene's perspective, its a
new feature.
Good point. Should I make it a new Lucene feature on 3.X? That is, should I
remove Solr's HTMLStripCharFilter and have it refer to a new Lucene
HTMLStripCharFilter?
> JFlex-based HTMLStripCharFilter replacement
> -------------------------------------------
>
> Key: LUCENE-3690
> URL: https://issues.apache.org/jira/browse/LUCENE-3690
> Project: Lucene - Java
> Issue Type: New Feature
> Components: modules/analysis
> Affects Versions: 3.5, 4.0
> Reporter: Steven Rowe
> Assignee: Steven Rowe
> Fix For: 3.6, 4.0
>
> Attachments: LUCENE-3690.patch, LUCENE-3690.patch
>
>
> A JFlex-based HTMLStripCharFilter replacement would be more performant and
> easier to understand and maintain.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]