[ https://issues.apache.org/jira/browse/LUCENE-2094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12783410#action_12783410 ]
Robert Muir commented on LUCENE-2094: ------------------------------------- bq. This is one thing I thought about too - I did not change it to keep the noise as low as possible in the patch but if we want to do it we can do in this patch too. well I think it will be noisy either way (updating all the analyzers, etc), but will make things a lot more consistent and easier to maintain... if you do this then StopFilter takes version so it can be modified / bugfixed in the future in other ways too, with less noise. I also think it will make it easier to write an analyzer. because even completely ignoring the unicode issue, with the current codebase: {code} streams.source = new StandardTokenizer(matchVersion, reader); streams.result = new StandardFilter(streams.source); streams.result = new LowerCaseFilter(matchVersion, streams.result); streams.result = new StopFilter(matchVersion, streams.result, stoptable); ... {code} reads a lot easier to me than {code} streams.source = new StandardTokenizer(matchVersion, reader); streams.result = new StandardFilter(streams.source); streams.result = new LowerCaseFilter(matchVersion, streams.result); streams.result = new StopFilter(StopFilter.getEnablePositionIncrementsVersionDefault(matchVersion), streams.result, stoptable); ... {code} > Prepare CharArraySet for Unicode 4.0 > ------------------------------------ > > Key: LUCENE-2094 > URL: https://issues.apache.org/jira/browse/LUCENE-2094 > Project: Lucene - Java > Issue Type: Bug > Components: Analysis > Affects Versions: 1.9, 2.0.0, 2.1, 2.2, 2.3, 2.3.1, 2.3.2, 2.3.3, 2.4, > 2.4.1, 2.4.2, 2.9, 2.9.1, 2.9.2, 3.0, 3.0.1, 3.1 > Reporter: Simon Willnauer > Fix For: 3.1 > > Attachments: LUCENE-2094.patch, LUCENE-2094.txt, LUCENE-2094.txt, > LUCENE-2094.txt > > > CharArraySet does lowercaseing if created with the correspondent flag. This > causes that String / char[] with uncode 4 chars which are in the set can not > be retrieved in "ignorecase" mode. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org