[jira] Commented: (LUCENE-2019) map unicode process-internal codepoints to replacement character

Robert Muir (JIRA) Fri, 30 Oct 2009 14:47:26 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12772110#action_12772110
 ]


Robert Muir commented on LUCENE-2019:
-------------------------------------

Michael, well if we go by the unicode standard:
Section 3.2

C2 A process shall not interpret a noncharacter code point as an abstract 
character.
• The noncharacter code points may be used internally, such as for sentinel 
values
or delimiters, but should not be exchanged publicly.

This makes me think they should not be in terms, but i'll take anyone's 
interpretation.
if people disagree, just cancel the issue as not fix. i don't think this 
approach will hurt performance.


> map unicode process-internal codepoints to replacement character
> ----------------------------------------------------------------
>
>                 Key: LUCENE-2019
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2019
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Robert Muir
>            Priority: Minor
>         Attachments: LUCENE-2019.patch
>
>
> A spinoff from LUCENE-2016.
> There are several process-internal codepoints in unicode, we should not store 
> these in the index.
> Instead they should be mapped to replacement character (U+FFFD), so they can 
> be used process-internally.
> An example of this is how Lucene Java currently uses U+FFFF 
> process-internally, it can't be in the index or will cause problems. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Commented: (LUCENE-2019) map unicode process-internal codepoints to replacement character

Reply via email to