[ 
https://issues.apache.org/jira/browse/LUCENE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12577513#action_12577513
 ] 

Michael McCandless commented on LUCENE-1221:
--------------------------------------------

Hmmm ... 0xffff is one of the "invalid for interchange but may freely
be used internal to an implementation" UTF-16 characters (from
http://unicode.org/faq/utf_bom.html#6), so I assumed it was safe to
use internally in DocumentsWriter.

But apparently you are using it.  How/why are you seeing/using this
character in Jackrabbit?

Note that with LUCENE-510 (not yet fixed but in progress), there may
be similar issues whereby the treatment of other kinds of invalid
UTF-16 strings changes.



> DocumentsWriter truncates term text at \uFFFF
> ---------------------------------------------
>
>                 Key: LUCENE-1221
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1221
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 2.3, 2.3.1
>            Reporter: Marcel Reutegger
>            Priority: Minor
>         Attachments: OddTermTest.java
>
>
> When a Term text contains the unicode 'character' \uFFFF, DocumentsWriter 
> will truncate the text and only write the text up to the \uFFFF character.
> This has been introduces with changes for LUCENE-843 to reduce memory usage 
> and improve performance.
> This change in behavior prevents us (Jackrabbit) from upgrading to Lucene 2.3.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to