[ https://issues.apache.org/jira/browse/LUCENE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12577513#action_12577513 ]
Michael McCandless commented on LUCENE-1221: -------------------------------------------- Hmmm ... 0xffff is one of the "invalid for interchange but may freely be used internal to an implementation" UTF-16 characters (from http://unicode.org/faq/utf_bom.html#6), so I assumed it was safe to use internally in DocumentsWriter. But apparently you are using it. How/why are you seeing/using this character in Jackrabbit? Note that with LUCENE-510 (not yet fixed but in progress), there may be similar issues whereby the treatment of other kinds of invalid UTF-16 strings changes. > DocumentsWriter truncates term text at \uFFFF > --------------------------------------------- > > Key: LUCENE-1221 > URL: https://issues.apache.org/jira/browse/LUCENE-1221 > Project: Lucene - Java > Issue Type: Bug > Components: Index > Affects Versions: 2.3, 2.3.1 > Reporter: Marcel Reutegger > Priority: Minor > Attachments: OddTermTest.java > > > When a Term text contains the unicode 'character' \uFFFF, DocumentsWriter > will truncate the text and only write the text up to the \uFFFF character. > This has been introduces with changes for LUCENE-843 to reduce memory usage > and improve performance. > This change in behavior prevents us (Jackrabbit) from upgrading to Lucene 2.3. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]