[ https://issues.apache.org/jira/browse/LUCENE-2094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12797758#action_12797758 ]
Simon Willnauer commented on LUCENE-2094: ----------------------------------------- Hi Yonik, bq. It looks like it was committed as part of this issue, but I can't find any comments here about either the need to make a copy or the need to make a unmodifiable set. I try to help you to reconstruct the whole thing a bit. UnmodifiableCharArraySet was introduces with LUCENE-1688 as far as I recall to replace the static string array (stopwords) in StopAnalyzer. During the refactoring / improvements in contrib/analyzers we decided to make analyzers and tokenfilters immutable and use chararrayset whereever we can. To prevent provided set from being modified while they are in use in a filter the given set is copied and wrapped in an immutable instance of chararrayset. At the same time (still ongoing) we try to convert every set which is likely to be used in a TokenFilter into a charArraySet. Wordlistloader is not done yet but on the list, the plan is to change the return values from HashSet<?> into Set<?> and create CharArraySet instances internally. With LUCENE-2034 we introduced StopwordAnalyzerBase which also uses the UnmodifiableCharArraySet with a copy of the given set. The copy of a charArraySet is very fast even for large sets and the creation of a unmodifiableCharArraySet from a CharArraySet instance is basically just an object creation. The background is, again to prevent any modification to those sets while they are in use. bq. This new behavior also no longer matches the javadoc for the constructor. I agree we should adjust the javadoc for ctors expecting stopwords to reflect the behavior. > Prepare CharArraySet for Unicode 4.0 > ------------------------------------ > > Key: LUCENE-2094 > URL: https://issues.apache.org/jira/browse/LUCENE-2094 > Project: Lucene - Java > Issue Type: Bug > Components: Analysis > Affects Versions: 3.0 > Reporter: Simon Willnauer > Assignee: Uwe Schindler > Fix For: 3.1 > > Attachments: LUCENE-2094.patch, LUCENE-2094.patch, LUCENE-2094.patch, > LUCENE-2094.patch, LUCENE-2094.patch, LUCENE-2094.patch, LUCENE-2094.txt, > LUCENE-2094.txt, LUCENE-2094.txt > > > CharArraySet does lowercaseing if created with the correspondent flag. This > causes that String / char[] with uncode 4 chars which are in the set can not > be retrieved in "ignorecase" mode. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org