[ https://issues.apache.org/jira/browse/LUCENE-1466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12722283#action_12722283 ]
Koji Sekiguchi commented on LUCENE-1466: ---------------------------------------- Oops. Thanks for the updated patch, Mike! {quote} * Can you add a CHANGES entry describing this new feature, as well as the change in type of Tokenizer.input? * Can we rename NormalizeMap -> NormalizeCharMap? * Could you add some javadocs to NormalizeCharMap, MappingCharFilter, BaseCharFilter? {quote} Your patch looks good! {quote} * The BaseCharFilter correct method looks spookily costly (has a for loop, going backwards for all added mappings). It seems like in practice it should not be costly, because typically one corrects the offset only for the "current" token? And, one could always build their own CharFilter (eg using arrays of ints or something) if they needed a more efficient mapping. {quote} Yes, users can create their own CharFilter if they needed a more efficient mapping. {quote} * MappingCharFilter's match method is recursive. But I think the depth of that recursion equals the length of character sequence that's being mapped, right? So risk of stack overlflow should be basically zero, unless someone does some insanely long character string mappings? {quote} You are correct. {quote} I think we should make an exception to back-compat here, and simply change TokenStream.input from Reader to CharStream (subclasses Reader). Properly respecting back-compat will be alot of work, and, if external subclasses are directly assigning to input, they really ought to be using reaset(Reader) instead. {quote} I agree with you, Mike. > CharFilter - normalize characters before tokenizer > -------------------------------------------------- > > Key: LUCENE-1466 > URL: https://issues.apache.org/jira/browse/LUCENE-1466 > Project: Lucene - Java > Issue Type: New Feature > Components: Analysis > Affects Versions: 2.4 > Reporter: Koji Sekiguchi > Assignee: Michael McCandless > Priority: Minor > Fix For: 2.9 > > Attachments: LUCENE-1466-back.patch, LUCENE-1466.patch, > LUCENE-1466.patch, LUCENE-1466.patch, LUCENE-1466.patch > > > This proposes to import CharFilter that has been introduced in Solr 1.4. > Please see for the details: > - SOLR-822 > - http://www.nabble.com/Proposal-for-introducing-CharFilter-to20327007.html -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org