On Fri, 2004-04-30 at 08:03, Tim Bunce wrote:

> "You can use UTF8 and AL32UTF8 by setting NLS_LANG for OCI client
> applications. If you do not need supplementary characters, then it
> does not matter whether you choose UTF8 or AL32UTF8. However, if
> your OCI applications might handle supplementary characters, then
> you need to make a decision. Because UTF8 can require up to three
> bytes for each character, one supplementary character is represented
> in two code points, totalling six bytes. In AL32UTF8, one supplementary
> character is represented in one code point, totalling four bytes."
> 
> So the key question is... can we just do SvUTF8_on(sv) on either
> of these kinds of Oracle UTF8 encodings? Seems like the answer is
> yes, from what Jarkko says, because they are both valid UTF8.
> We just need to document the issue.

No, Oracle's "UTF8" is very much not valid UTF-8. Valid UTF-8 cannot
contain surrogates. If you mark a string like this as UTF-8 neither
the Perl core nor other extension modules will be able to interpret
it correctly.

(As people have pointed out earlier in the thread,
if you want a standard name for this weird form of encoding, that's
"CESU": http://www.unicode.org/reports/tr26/.)

You'll need to do a conversion pass before you can mark it as UTF-8.

Regards,
                                                Owen

Attachment: signature.asc
Description: This is a digitally signed message part

Reply via email to