[
https://issues.apache.org/jira/browse/LUCENE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12577513#action_12577513
]
Michael McCandless commented on LUCENE-1221:
--------------------------------------------
Hmmm ... 0xffff is one of the "invalid for interchange but may freely
be used internal to an implementation" UTF-16 characters (from
http://unicode.org/faq/utf_bom.html#6), so I assumed it was safe to
use internally in DocumentsWriter.
But apparently you are using it. How/why are you seeing/using this
character in Jackrabbit?
Note that with LUCENE-510 (not yet fixed but in progress), there may
be similar issues whereby the treatment of other kinds of invalid
UTF-16 strings changes.
> DocumentsWriter truncates term text at \uFFFF
> ---------------------------------------------
>
> Key: LUCENE-1221
> URL: https://issues.apache.org/jira/browse/LUCENE-1221
> Project: Lucene - Java
> Issue Type: Bug
> Components: Index
> Affects Versions: 2.3, 2.3.1
> Reporter: Marcel Reutegger
> Priority: Minor
> Attachments: OddTermTest.java
>
>
> When a Term text contains the unicode 'character' \uFFFF, DocumentsWriter
> will truncate the text and only write the text up to the \uFFFF character.
> This has been introduces with changes for LUCENE-843 to reduce memory usage
> and improve performance.
> This change in behavior prevents us (Jackrabbit) from upgrading to Lucene 2.3.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]