[ 
https://issues.apache.org/jira/browse/LUCENE-3690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184983#comment-13184983
 ] 

Steven Rowe commented on LUCENE-3690:
-------------------------------------

{quote}
bq. This would be the first back-compat enabled CharFilter.

What would be the motivation?
{quote}

There are differences in the behavior, but I guess all of these could be 
characterized as bug fixes:

# Supplementary characters in tags will be recognized.  The old version doesn't 
do this.
# CDATA sections are recognized.  The old version doesn't; people have 
requested this, e.g. 
[http://www.lucidimagination.com/search/document/48fcd906e39764ec#48fcd906e39764ec])
# No space is substituted for inline tags (e.g. {{<b>}}, {{<i>}}, {{<span>}}).  
The old version substitutes spaces for all tags; people have complained e.g. 
[on 
SOLR-1343|https://issues.apache.org/jira/browse/SOLR-1343?focusedCommentId=13096839&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13096839]
# Broken MS-Word-generated processing instructions {{<? ... />}} will be 
handled.
# Uppercase character entities "quot", "copy", "gt", "lt", "reg", and "amp" 
will be recognized (from Dawid Weiss's SOLR-882 patch); the old version doesn't 
do this.

bq. Are there some features of the previous one that don't make sense in this 
implementation?

No, not as far as I can tell.  I think all features of the previous one are 
included.

bq. Also I don't know how Version etc would work here, since the old 
HtmlStripCharFilter was never part of lucene.  from lucene's perspective, its a 
new feature.

Good point.  Should I make it a new Lucene feature on 3.X?  That is, should I 
remove Solr's HTMLStripCharFilter and have it refer to a new Lucene 
HTMLStripCharFilter?

                
> JFlex-based HTMLStripCharFilter replacement
> -------------------------------------------
>
>                 Key: LUCENE-3690
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3690
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: modules/analysis
>    Affects Versions: 3.5, 4.0
>            Reporter: Steven Rowe
>            Assignee: Steven Rowe
>             Fix For: 3.6, 4.0
>
>         Attachments: LUCENE-3690.patch, LUCENE-3690.patch
>
>
> A JFlex-based HTMLStripCharFilter replacement would be more performant and 
> easier to understand and maintain.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to