[
https://issues.apache.org/jira/browse/DERBY-2346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12488197
]
Kristian Waagan commented on DERBY-2346:
----------------------------------------
Regarding the UTF-8 char -> byte -> char conversion using String methods, I
don't think it is a bug. Unmappable "chars" are represented by '?' (0xf3 / 63).
In the snippet above, (char)56249 (0xdbb9) happens to be in a PUA area. These
codepoints are reserved for private use, and the Unicode standard does not
define any characters for them.
You could use DataOutput/DataInput and write-/readUTF, but I don't know how
efficient this would be. These methods write the strings to the modfied UTF-8
format, and the equals in the example above returns true. I think writing your
own method would be acceptable, but it would be interesting if anyone took the
time to investigate the cpu/space differences (i.e. what kind of stream can we
use underneath? ByteArrayOutputStream? Subclass of it that returns reference to
the byte array?)
Even though the example uses a "very special codepoint", the database should
handle it. An application could potentially use it for its own custom character
(not quite sure how though). Further, it seems the "UTF-8" encoding (as used in
String.getBytes()) does not promise to encode all unsigned 16 bit values, but
only valid Unicode characters.
I'm not very good with the Unicode terminology, so there might be errors in my
comment and maybe important additions. Feel free to correct me.
> Provide set methods for clob for embedded driver
> ------------------------------------------------
>
> Key: DERBY-2346
> URL: https://issues.apache.org/jira/browse/DERBY-2346
> Project: Derby
> Issue Type: Sub-task
> Components: JDBC
> Affects Versions: 10.3.0.0
> Reporter: Anurag Shekhar
> Assigned To: Anurag Shekhar
> Attachments: derby-2346-only_for_review.diff, derby-2346.v1.diff
>
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.