[jira] Commented: (LUCENE-2094) Prepare CharArraySet for Unicode 4.0

DM Smith (JIRA) Tue, 01 Dec 2009 03:07:48 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-2094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12784175#action_12784175
 ]


DM Smith commented on LUCENE-2094:
----------------------------------

bq. I would like to open another issue for roberts patch. The reason for this 
is that I feel that issues like that get sidetracked quite often and its hard 
to follow once this happens. This would make discussions more clear and would 
help to prevent situations like this.

Just my opinion:

I don't like committing part of an issue. I think that when/if there is a point 
at which a commit is needed, for whatever reason, and there is more to do or to 
discuss, the issue needs to be split. I think a JIRA issue should be 
represented by a single commit.

This issue pertains to making CharSetArray properly handle surrogates when 
lowercasing. The use case in Lucene are the stop word lists. These are used by 
the StopFilter, which has an ugliness that needed fixing.

I understand that sometimes more than one thing gets done in an issue because 
it is to hard to manage as multiple issues. What I call a ripple effect. It 
appears that this is happening here.

I think changes other than that should be another issue, a sub-issue, or a 
linked issue? As it stands, Robert's patch, having the same name as Simon's, 
makes it appear that it supersedes the prior with the same name. It is 
confusing without the context of reading the thread.


> Prepare CharArraySet for Unicode 4.0
> ------------------------------------
>
>                 Key: LUCENE-2094
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2094
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Analysis
>    Affects Versions: 3.0
>            Reporter: Simon Willnauer
>            Assignee: Uwe Schindler
>             Fix For: 3.1
>
>         Attachments: LUCENE-2094.patch, LUCENE-2094.patch, LUCENE-2094.patch, 
> LUCENE-2094.patch, LUCENE-2094.patch, LUCENE-2094.patch, LUCENE-2094.txt, 
> LUCENE-2094.txt, LUCENE-2094.txt
>
>
> CharArraySet does lowercaseing if created with the correspondent flag. This 
> causes that  String / char[] with uncode 4 chars which are in the set can not 
> be retrieved in "ignorecase" mode.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2094) Prepare CharArraySet for Unicode 4.0

Reply via email to