Re: [Numpy-discussion] PyArray_Scalar() and Unicode

Pauli Virtanen Sun, 13 Jun 2010 05:55:44 -0700

Sat, 12 Jun 2010 17:33:13 -0700, Dan Roberts wrote:
[clip: refactoring PyArray_Scalar]
>     There are a few problems with this.  The biggest problem for me is
> that it appears PyUCS2Buffer_FromUCS4() doesn't produce UCS2 at all, but
> rather UTF-16 since it produces surrogate pairs for code points above
> 0xFFFF.  My first question is: is there any time when the data produced
> by PyUCS2Buffer_FromUCS4() wouldn't be parseable by a standards
> compliant UTF-16 decoder?


Since UTF-16 = UCS-2 + surrogate pairs, as far as I know, the data 
produced should always be parseable by DecodeUTF16.

Conversion to real UCS-2 from UCS-4 would be a lossy procedure, since not 
all code points can be represented with 2 bytes.

-- 
Pauli Virtanen

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] PyArray_Scalar() and Unicode

Reply via email to