On Thu, 2004-04-29 at 11:16, Tim Bunce wrote:
> Am I right in thinking that perl's internal utf8 representation
> represents surrogates as a single (4 byte) code point and not as
> two separate code points?
> 
> This is the form that Oracle call AL32UTF8.
> 
> What would be the effect of setting SvUTF8_on(sv) on a valid utf8
> byte string that used surrogates? Would there be problems?
> (For example, a string returned from Oracle when using the UTF8
> character set instead of the newer AL32UTF8 one.)
> 
I think it makes no difference. (at least I could no find one), except
for the internal storage.  Several of the tests I wrote print a sql
DUMP(nch), and you can see the difference in the internal store in those
prints.  The strings come back to the client, the way they were put in.

I have tested this with 4 databases

dbcharset/ncharset
--------- --------
us7ascii/utf8
us7ascii/all6utf16
utf8    /utf8
utf8    /al16utf16

All tests produce the same results with all databases using both .UTF8
and .AL32UTF8 in NLS_LANG.

Lincoln


Reply via email to